Open Pablo-Aja-Macaya opened 1 year ago
i have the "same" Problem
Getting all reads from /mnt/sdc1/reports/2023_04_20_10_53_57_le_kuniq_standantet2tus_O.txt matched to the following taxa: 2645622, 1318743, 774, 2759660, 1686310 Found 1 reads for 774 Found 1 reads for 1318743 Found 65 reads for 1686310 Found 0 reads for 2645622 Found 8 reads for 2759660 Extracting 75 reads from FASTQ file /mnt/fastq/test.gz Number of extracted reads: 0
Even with 1812935 i had 1 read but nothing get exported ! :( sudo apt-get update sudo apt-get upgrade nope ! :( I update also perl but the error still persist. Then i remove the line from above https://github.com/fbreitwieser/krakenuniq/blob/d71fd574ea9c79502b650ef1b66cc88ed5f387f7/scripts/krakenuniq-extract-reads#L150 that works for me.
Summary
Hello,
I have observed a possible problem / unexpected occurence while using
krakenuniq-extract-reads
in reads downloaded from SRA: It seems that read IDs with dots (.) in them are not outputted to the final result, and the issue is fixed if these dots are replaced by dashes (-) or by removing a certain line in its script.Maybe proper IDs are expected from the user, but the tool does not give any warning or error, and SRA returns these types of IDs by default.
I think it might be related to this line in krakenuniq-extract-reads. If this is commented out the problem is solved for this case (although probably it is needed for other types of IDs).
Thank you!
Detected problem
In the input used there are two sets of ONT reads with different types of IDs, examples here:
The ones with the SRR structure do not go through krakenuniq-extract-reads. The other ones do go through if taxids from bacteria are selected.
Some lines from the input:
Output (taxid 10310 is used, and it detects the classified reads, but does not extract them):
If I extract taxid 1812935, which is only present in reads with IDs such as "fc7b2e9e-e175-4e4d-af58-b66879c737f2", it works:
Tests
A new conda environment was created for krakenuniq==1.0.3 and results did not change. I leave it here just in case:
Afterwards, read "SRR17709759.50732.1" was duplicated and renamed to "this-is-unique-id" (in kraken results and reads input), which resulted in the read being correctly extracted with the same command. Input shown here:
Then, all dots in reads IDs were replaced with dashes, resulting in all the expected reads being extracted. Here is the input:
And the output:
Finally, if a line in krakenuniq-extract-reads is commented out and I use the original input (dots in read IDs) it works: