Open cabbagesofdoom opened 10 months ago
Oh yea that's my bad.
Let me fix that and get back to you.
Hey,
Any chance you could show me the header columns of your data?
The first 2 lines above the actual reads (and below the header values)
It should be something like
#char* uint32_t double double double double uint64_t int16_t* enum{unknown,partial,mux_change,unblock_mux_change,data_service_unblock_mux_change,signal_positive,signal_negative} char* double int32_t uint8_t uint64_t
#read_id read_group digitisation offset range sampling_rate len_raw_signal raw_signal end_reason channel_number median_before read_number
start_mux start_time
What I'm looking for here is this part of it
enum{unknown,partial,mux_change,unblock_mux_change,data_service_unblock_mux_change,signal_positive,signal_negative}
This is the list that blue-crab tried to get from your slow5 file. If it's not present or fails, it tries to make it a list of just ["unknown"].
It looks like it's trying to use a value that is outside the length of the list. So having a look at the list is a good start to see if there is anything weird going on there.
An easy way to get that value from a blow5 file is to run this command
slow5tools view reads.blow5 | less
and just scroll down to that header line and copy paste it here.
Thanks James
I have also just pushed a change to the dev branch that has a check on this line of code that will spit out what slow5_end_reason_labels
is set to if it fails as a quick way to troubleshoot.
So another way is to switch to the dev branch, run pip install .
again then try running the same conversion again and wait for it to hit the same error.
Thanks James
Hi @Psy-Fer,
I am also getting the same error:
blue-crab s2p minion_sim_1000_itrs.blow5 -o minion_test.pod5
05-Apr-24 19:03:38 - blue-crab - [INFO]: single2single: 1 s/blow5 file detected as input. Writing 1:1 s/blow5->pod5 to file: minion_test.pod5
05-Apr-24 19:03:38 - blue-crab - [INFO]: Opening s/blow5 file: minion_sim_1000_itrs.blow5
[slow5_get_aux_enum_labels::ERROR] No enum auxiliary type exists. At src/slow5.c:1458
05-Apr-24 19:03:38 - pyslow5 - [WARNING]: get_aux_enum_labels enum_labels is NULL
Traceback (most recent call last):
File "/home/tomas/.local/bin/blue-crab", line 8, in
The last two lines before the actual reads are:
There is no enum{} in my files.
The slow5 files were generated (using the subprocess.run function of python) with the dna-r10-min model and full-contigs:
"squigulator " + fastaderep + " -x dna-r10-min -o ./tmp/tmp" + str(i) + ".slow5 --full-contig --seed " + str(random_numbers[i])
and then merged:
slow5tools merge tmp -o minion_sim_1000_itrs.slow5
The individual tmp files as well as the merged files have the same structure and no enum{} on line 9.
I tried the dna-r9-min model, and there also is no enum{} on line 9.
Thanks,
Tomas
Ahh so these reads were built with squigulator?
I'll need to tell @hasindu2008 to put a dummy end_reason in the blow5 output.
In the meantime, I'll modify blue-crab to insert a dummy enum via an argument, making all reads end in the signal_positive state.
I'll get back to you in a sec
James
Hi Tomas,
Could you please try using the dev branch and showing me the error it gives you?
You can do this by activating your environment
if you installed with pip from pypi, please clone the blue-crab repo
git clone git@github.com:Psy-Fer/blue-crab.git
Then go to the blue-crab repo and run git pull
then git switch dev
You can check it worked by running git status
and it should say something like
On branch dev
Your branch is up to date with 'origin/dev'.
Then re-install this dev version into your env
pip install .
Now re-run your bluecrab command.
something fishy is going on, but this should figure it out.
Cheers, James
Hi James,
Thanks for looking into this.
Here is the output of the dev branch:
blue-crab s2p minion_sim_1000_itrs.blow5 -o minion_test.pod5
10-Apr-24 09:04:21 - blue-crab - [INFO]: single2single: 1 s/blow5 file detected as input. Writing 1:1 s/blow5->pod5 to file: minion_test.pod5
10-Apr-24 09:04:21 - blue-crab - [INFO]: Opening s/blow5 file: minion_sim_1000_itrs.blow5
[slow5_get_aux_enum_labels::ERROR] No enum auxiliary type exists. At src/slow5.c:1458
10-Apr-24 09:04:21 - pyslow5 - [WARNING]: get_aux_enum_labels enum_labels is NULL
Traceback (most recent call last):
File "/home/tomas/.local/bin/blue-crab", line 8, in
Ahh progress!
Okay so now the issue is the readID isn't a valid uuid. Again I think that's a squigulator issue.
@hasindu2008 what are the readIDs you make?
The issue here is that pod5 requires the readID to be a uuid. So I can't just use any old string.
Ideally squigulator would create these and then blue-crab just reads the string and converts it.
Another option is in the absence of valid uuids I add an option to create one. But then you can't link the old reads to the new reads (unless I make a tsv file that provides the mapping).
What do you think?
James,
I agree, the solution is to have valid UUID and a dummy end_reason in the slow5/blow5 output generated by squigulator @hasindu2008.
Not having this also most likely breaks the butterfly-eel wrapper.
Ultimately, I need to be able to basecall the simulated slow5/blow5 files generated by squigulator so I can use the called fastq files for downstream analyses.
Thanks,
Tomas
Buttery-eel I can unbreak by using dummy uuids when i basecall and then replace the original readID when the read comes back.
The issue is going over to pod5 you can't do this because of their strict typing. So yea, either squigulator produces uuids or I create them in blue-crab and give a file that maps squigulator readIDs with uuids.
Let's see what @hasindu2008 thinks and then we will implement it asap
James
Hey all,
The reason I adhere to the current readID format in squigulator is so that it is compatible with the "mapeval" utility in Minimap2's Paftools companion script. This is quite useful for assessing the mapping accuracy once the reads are basecalled. Also, I like deterministic read IDs compared to random ones.
It is very strange that POD5 needs the readid to be a UUID. Perhaps in their implementation, they simply store the UUID as a 128-bit integer instead of storing it as a variant-length string. This is not great, as this means POD5 is stuck with UUID forever as their read IDs, well, might change later and break backward compatibility. ReadID in many bioinformatics formats including BAM format has been a variable string.
Perhaps, I can implement Squigulator an option called --ont-friendly
that produces some fake UUIDs for the read IDs, as well as a fake end_reason with the value "unknown". Let me know your thoughts on this. This way, there is no need for the blue crab to do any "UUIdification" of the readIDs. If you all are happy, I can implement this to squigulator ASAP.
By the way, @Psy-Fer, is this UUID thing applicable to buttery-eel too? It wasn't a problem when using ont-guppy-server with the eel. Perhaps they enforced this UUID in ont-dorado-server? If they have enforced it (which is of limited sense to me), I would be very glad if you could do some internal mapping with a fake uuid when sending to the ont-basecall-server, but write the original readID to the FASTQ/SAM.
Also cross-referencing to the issue in squigulator that raises the same issue: https://github.com/hasindu2008/squigulator/issues/13
Hey,
Okay I'll just make absolutely sure what pod5 is doing so we are 100% correct when we do this.
James
@hasindu2008 and @Psy-Fer
Perhaps, I (@hasindu2008) can implement Squigulator an option called --ont-friendly that produces some fake UUIDs for the read IDs, as well as a fake end_reason with the value "unknown". Let me know your thoughts on this. This way, there is no need for the blue crab to do any "UUIdification" of the readIDs. If you all are happy, I can implement this to squigulator ASAP.
I think this is a great solution that will maintain maximum compatibility for downstream use.
Thanks,
Tomas
Okay I have confirmed that pod5 requires a uuid type for the readID, even though it shouldn't have to be.
--s2p--
verbose=1
-------------------blue-crab version-------------------
SLOW5/BLOW5 <-> POD5 converter version: 0.1.0
-------------------testcase:1: .slow5 to .pod5-------------------
12-Apr-24 17:35:30 - blue-crab - [INFO]: single2single: 1 s/blow5 file detected as input. Writing 1:1 s/blow5->pod5 to file: ./test//data/out/s2p/a.pod5
12-Apr-24 17:35:30 - blue-crab - [INFO]: Opening s/blow5 file: ./test//data/raw/s2p/a.slow5
12-Apr-24 17:35:30 - pyslow5 - [WARNING]: get_header_value header value not found: ip_address - rg: 0
12-Apr-24 17:35:30 - pyslow5 - [WARNING]: get_header_value header value not found: mac_address - rg: 0
Traceback (most recent call last):
File "/home/jamfer/pvenv/blue-crab-test/bin/blue-crab", line 8, in <module>
sys.exit(main())
File "/home/jamfer/pvenv/blue-crab-test/lib/python3.8/site-packages/src/blue_crab.py", line 1561, in main
slow52pod5(args)
File "/home/jamfer/pvenv/blue-crab-test/lib/python3.8/site-packages/src/blue_crab.py", line 713, in slow52pod5
s2s_s2p_worker(args, sfile, pod5_out)
File "/home/jamfer/pvenv/blue-crab-test/lib/python3.8/site-packages/src/blue_crab.py", line 1392, in s2s_s2p_worker
writer.add_read(read)
File "/home/jamfer/pvenv/blue-crab-test/lib/python3.8/site-packages/pod5/writer.py", line 256, in add_read
self.add_reads([read])
File "/home/jamfer/pvenv/blue-crab-test/lib/python3.8/site-packages/pod5/writer.py", line 292, in add_reads
*self._prepare_add_reads_args(reads),
File "/home/jamfer/pvenv/blue-crab-test/lib/python3.8/site-packages/pod5/writer.py", line 306, in _prepare_add_reads_args
[np.frombuffer(read.read_id.bytes, dtype=np.uint8) for read in reads]
File "/home/jamfer/pvenv/blue-crab-test/lib/python3.8/site-packages/pod5/writer.py", line 306, in <listcomp>
[np.frombuffer(read.read_id.bytes, dtype=np.uint8) for read in reads]
AttributeError: 'str' object has no attribute 'bytes'
testcase 1 failed
This is what happens if we just parse a str
it's trying to access the bytes method on the uuid type specifically, as that is what they expect.
So yea, I think we need to go with dummy uuids, and just make a tsv file that maps the uuid with the more verbose read information you want to store.
James
@Psy-Fer I am implementing an option in squigulator to generate uuids for readids, so blue-crab does not need to do anything.
Please check if the buttery-eel is also broken due to this uuid thing?
Buttery-eel should be fine, unless they change something in the dorado server code https://github.com/Psy-Fer/buttery-eel/issues/32 I use to think it was an issue, but turned out it was just a change in how dorado-server handles reads that are too short.
I should probably merge the buttery-eel/skipped branch into main and do a release to handle this.
@killidude
If you compile squigulator from the dev branch, and specify the option --ont-friendly=yes
it should be pod5 conversion compatible.
When you specify --ont-friendly=yes
it will add a dummy end_reason and create fake UUID for read IDs so.
If you encounter issues let me know, thanks.
Seems like buttery-eel works even without things being uuid as James mentioned above.
James,
I agree, the solution is to have valid UUID and a dummy end_reason in the slow5/blow5 output generated by squigulator @hasindu2008.
Not having this also most likely breaks the butterfly-eel wrapper.
Ultimately, I need to be able to basecall the simulated slow5/blow5 files generated by squigulator so I can use the called fastq files for downstream analyses.
Thanks,
Tomas
@hasindu2008,
Thanks for implementing this option. I can now convert the squigulator generated files to pod5.
Thanks for your help,
Tomas
Hi @Psy-Fer ,
I'm using squigulator (v0.4.0) with the --ont-friendly=yes
parameter and blue-crab (v0.2.0):
The error occurs during the conversion of a squigulator .slow5 file to .pod5. Here’s the error traceback:
04-Oct-24 16:12:25 - blue-crab - [INFO]: single2single: 1 s/blow5 file detected as input. Writing 1:1 s/blow5->pod5 to file: test.pod5
04-Oct-24 16:12:25 - blue-crab - [INFO]: Opening s/blow5 file: squigulator_reads.slow5
Traceback (most recent call last):
File "/X.local/bin/blue-crab", line 8, in <module>
sys.exit(main())
File "/X/.local/lib/python3.8/site-packages/src/blue_crab.py", line 1562, in main
slow52pod5(args)
File "/X/.local/lib/python3.8/site-packages/src/blue_crab.py", line 717, in slow52pod5
s2s_s2p_worker(args, sfile, pod5_out)
File "/X/.local/lib/python3.8/site-packages/src/blue_crab.py", line 1195, in s2s_s2p_worker
reason, forced = s2p_end_reason_convert(s5_end_reason)
File "/X/.local/lib/python3.8/site-packages/src/blue_crab.py", line 94, in s2p_end_reason_convert
"api_request": (p5.EndReasonEnum.API_REQUEST, False),
File "/usr/lib/python3.8/enum.py", line 384, in __getattr__
raise AttributeError(name) from None
AttributeError: API_REQUEST
I suspect this issue might be related to a recent pull request based on the new pod5 spec from about a month ago. Is there a way to avoid this error?
Hmm..make sure you have the latest pod5 version?
Which version do you have? Please do a pip list for me?
Thank you for the fast answer, upgrading pod5 fixed the problem!
Hi @Psy-Fer,
I am trying to convert some blow5 files to pod5 and get this error:
Any ideas of what might cause this and how I might fix it?
Thanks!
Rich