Closed timyerg closed 9 months ago
I think that I fixed the issue by modifying the rules. I will wait and see how far I will get this time. Can I fork the repo and pull changes? Or I can put the modified file here when I will make sure that all fixes are working as intended.
Hi there,
We tested the changes on our dataset and they are working well. Could you please provide us with some additional information about your headers and base caller data? This will help us determine if the issue is specific to your dataset. Additionally, could you please share the details of the changes you have made?
Thank you, Aman
That means I am lucky to have a dataset that can cause some issues! As always =). I added one line to each iteration that renames IDs in the racon output since the error was caused by the same IDs in fasta files and racon outputs in the next minimap run.
shell:
"""
racon -m 8 -x -6 -g -8 -w 500 -t {threads} {input[0]} {input[1]} {input[2]} > {output.tmp};
sed -i -e 's/>/>polished3/g' {output.tmp};
sed 's/[>].*[^|]|/>' {output.tmp} > {output.final}
"""
After fixing it like this some samples already progressed to medaka with following IDs:
>polished4polished3polished2polished18fc9f2bb-eb3f-4eef-954b-4b25ba6535bb
From the person whos dataset I am working with I got following info:
primers and adapters are already trimmed and the only quality-based filtering performed was in the minion during base calling in HAC (high-accuracy) mode (kept nanopore reads with Q > 8).
After receiving the files, I run chopper:
gunzip -c $gz | chopper -q 10 -l 1200 --threads {THREADS} | gzip > $CR/$file
Here is an example from original fastq file I got:
@db01deda-dc12-4612-af5e-c2a00ac9193d runid=d16a83d0f3e4f48e46f5ca136bf647f57d0f5927 sampleid=no_sample read=26672 ch=425 start_time=2023-06-07T11:21:32Z model_version_id=2021-05-17_dna_r9.4.1_minion_384_d37a2ab9 barcode=barcode06
GATGAACGCCGGCGGTGTGCTAATACATGCAAGTCGAACGCGTTGGCCCAATTGATTGATGGTGCTTTACCTGATTGATTTTTGGTCGCCAGCGAAGGTGGCGGACGGGTGGTAACACGTAGATGGCCCTGCCCAGAAGCGGGGGACAACATTTGGAAACAGATGCTAATACCGCATAACAGCGTTGTTCGCATGAACAACGCTTCTGCTTCTCGCTATCACTTCTGGATGGACCTGCGGTCATTAGCTTGTTGGTGGGGTAACGGCCTACCAAGGCGATGATGCATAGCCGAGTTGAAAACTGATCGGCCACAATGGGACTGAGACACGGCCCATACTCCTACGGGAGGCAGCAGTAGGGAATCTTCTACAATGGGCGCAAGCCTGATGAACAACACCGCGTGAGTGAAGAAGGTTTTCGGCTCGTAAAGCTCTGTTGTTAAAGAAGAACACGTATAGAGTAACTGTTCATACGTTGACGGTATTTAACCAGAAAGTCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCTTACTTATCCGGATTTATTGGGCGTAAAGAGAGTGCAGGCGGTTTTCTAAGTCTGATGAAAAGCCTTCGGTAACCGGAGAAGTGCATCGGAAACTGGATAACTTGAGTGCAGAAGGGTAAATTGGAACTCCATGTGTAGCGGTGGAATGCGTAGATATATATGGAAGAACACCAGTGGCGAAGGCAACGGCTACCTGGTCTGCAACTGACGCTGAGACTCAACATGGAGTAGCGAACAGGATTAGATACCCCTGTCCATGCCGTAAACGATGAGTGCTAGGTGTTGGAGGTTTCCGCCCTTCAGTGCCGGGCTAACGCATTCGCCTCCGCCTGGGGGTACGACTGCAAGTTCAAACTCCAAAGAAGAGATTCGACGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGAAGCTACGCGAAAAAGTGCCCGCCTGACATCTTGCGCAGCAGAGATGGGGCGTTTCAGGAACGCGCAATGACGGTGGTGCATGGTCGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAGCAGCACAACCCTTGTTACTAGTTGCCAGCATTAAGTTGGGCACTCTAGAGAGACCGCCGGTGACCGGAGGAAGGTGGGGATGACGTCAAGTCATCATGCCCCTTATGACCTGGGCTACACACGTGCTGCAATGGGCAGTACAACGAGAAGCGAACCCGCGAGGGTAGCGGATCTCTTAAAGCTGTTCTCAGTTCGGACTGCAGGCTGCAACTCGCCTGCACGAAGCTGGAATCGCTAGTAATCGCGGATCAGCACGCCGCGGTGAATACGTTCCCGGGCGCGCACACCGCCCCGTCACACCATGGAAGTCTGCAATGCCCAAAGTCGGTGGGATAACCTTTGCCGTCAATAGCCGCCTAAGGTAAGAATTGACTGGGGTG
+
-.@;:::8:97551111940+++)))*-255568:55667<;;=>>==<==;>@A@<)(&'(*+++/.144554435<<=9644434(''&(+((****''*,,13.)()**.012--,1/*&&&%*,(,--,),--000***078::-,,-<=>0-154447>?@<?>>?;;9;:9::>6,,,,<968:11//02?@<100/.'&&'))*,,-0/../37789<5598;;54435/600/(''40//,(&')*156>BA=>A=A??=:(((();>:;::::?/...99=<<9999:60-((((8;?;411++,.8;??A<5454676:569:767668==:<;259:854400--//.-20:9;987,***-<=>9966:54464720)''''23578<?<=<65579,/0+*))-17899:79--.../&&'+/122:C@B:</0+.,**,,9<<%$$'%(*+34>?>8888?<<=EA;::5443333?++*+///1(''.5443588;8843,,,,,,-002=7370/..(&%%&()(''''&&,-.70+-,-44@@A::9:>>><88/.-+,.9;;;<<?>;@;?32<.7<21//($$$$$%&')+'%$%(.1.01--)))))>>@//688679?A97.-*+,,-9:<;988**((+)'())*2456;7&&&%+-12433459;?<<@((&&%&)..1{3:=A<69854785429::7841.+(%%%&,-565585563347;??>?>==;9)((+36'&&''7:.-++888778/..-123::;0///5669=:6943304447:,++,,56:<;:869:99988:9);====<:2,+*))*/695500((/9<=222'&&%&'5.89:;654*%%()+-656/('%%'%%&(*+&')-10,)*('&'')*+-,-23:21324;956698:::8854***+<==<;:=<43532226''''-..++*028,.0-+,/24455A=<11../'&&*')*,,34546..-+,,:97726886899(&&&'13677<=>><;:338456790.-+*('()+12=@??=:::<42423,,--**(''*../01***76967+)))*&&%%%%&56641444><<---0;444{858114::78800.((.48235646978:;<98///>0/58<787,,++)(()))*0212'322397768{:34;;:+**,8::=>=<>>?;333..-,/4+*))*06>=>>:0//''225;<9;{D4434525677..,..6:><<;;;9:9:<<;<A:::9;<<<>>::22--/-----,6<>:999=<=@46867;><6667;=?3?<:998520001336500/0*+-)'&))/+632048=33-,,((((-232:0((('9><><??,.:><66669=>==79775*))$$$&()))-05:;:<0,,..*'&$%&'),+,//100
I am not sure that after renaming IDs the pipeline will be able to finish properly so I am waiting for my samples to finish to see if I will get OTU counts. Probably additional renaming back to original IDs will be needed after medaka.
I attached a modified file.
BTW, is this warning should be taken care of?
WARNING: Output Nanopore_results/read_correction/medaka/C34F3_A_R1 already exists, may use old results.
Thank you again for the tool. I like the output after Mumu reclustering. The run was completed successfully. Renaming IDs to solve this issue did not affect other steps. The only thing is that taxonomy (Silva with Mothur) was not added to the final outputs, though is present in corresponding folder. I checked if renamed IDs are involved, but as far as I can see, they are not already involved at this step. So I am not sure what the reason is. It is unimportant for me since I planned to reclassify it outside the Natrix2 anyway. Closing the issue.
Best,
Hi there, Thank you so much for your comments. we have resolved the issues for taxonomy in the final output and will push changes soon to the main pipeline. Your feedback helps us improve our service, so thank you for sharing it with us.
Glad to help! One more question - I read the paper but did not find the answer - do you think species level annotations for 16S long nanopore reads are possible or it is better to work at genus level only (even if species were annotated)?
Hello,
If you're considering taxonomic classification with mothur, I would advise against working at the species level. Mothur itself discourages this practice, as outlined on its official webpage (https://mothur.org/wiki/classify.seqs/#common-questions).
"How do you recommend classifying to the species level? Unfortunately I do not. You will never get species level classification if you are using the RDP or Silva references. They only go to the genus level. Even the greengenes database only has 10% or so of sequences with species level names (greengenes hasn’t been updated in quite a few years). I and many others would contend that using 16S and especially a fragment to get a species name is asking too much - you need a culture and genome sequencing to do that. If someone wanted to give it a shot, they would need to add the species level names to the taxonomy strings. Also, they would need to add many more sequences that represent each species. Outside of a few groups of bacteria where the researchers have carefully selected the region (e.g. Lactobacillus or Staphylococcus), I really think this would be a lot of work for little/no benefit."
If your objective is to work at the species level, an alternative approach could be to employ taxonomic classification via blast, setting the 'ident' parameter to your desired minimal identity overlap between the target and query sequences. This threshold can be customized in the https://github.com/dbeisser/Natrix2/blob/dev/Nanopore.yaml file's options.
Feel free to reach out if you have any further questions!
Best, Dana
Thank you for detailed reply. I expected it, but decided to ask here in case of miracle.
Best, Timur
That is me again, unfortunately, with a new issue.
I am running Nanopore 16S data. My reads are already assembled, and primers removed. As advised here, I switched to dev repo and disabled the pychopper.
After running for a while, an error was encountered. I thought that it may be caused by RAM and increased it from 120 to 256, however, after relaunching the same project the same error arised.
As discussed here, the issue is related to duplicate IDs, but I don't know how to fix it here.
I hope that it is OK that I created two issues in one day.