Closed nick-youngblut closed 6 years ago
Hi!
It looks like a bug in parsing minimap2 output. Seems like it is rather specific, so we can't reproduce it without your help. Could you please send us raw minimap2 output from
/ebio/abt3_projects/vadinCA11/data/metagenome/simulated_metagenomes/shallow_sequencing/llmga/bin_refine/DAS_Tool/metaquast/runs_per_reference/1302858/minimap_output/metabat2_low_PE-003-contigs_broken.coords_tmp
or at least one of the input files, e.g.
/ebio/abt3_projects/vadinCA11/data/metagenome/simulated_metagenomes/shallow_sequencing/llmga/bin_refine/DAS_Tool//bins_DASTool_bins/metabat2_low_PE.003.contigs.fa
Thank you!
Thanks for looking into this issue! There is no minimap_output
directory under metaquast/runs_per_reference/1302858/
. Here's the contigs input file: metabat2_low_PE.003.contigs.fa.zip
Sorry, gave you the incorrect path, minimap output should be in
metaquast/runs_per_reference/1302858/contigs_reports/minimap_output
So, the correct full path will be
/ebio/abt3_projects/vadinCA11/data/metagenome/simulated_metagenomes/shallow_sequencing/llmga/bin_refine/DAS_Tool/metaquast/runs_per_reference/1302858/contigs_reports/minimap_output/metabat2_low_PE-003-contigs_broken.coords_tmp
By the way, we checked your contigs and found nothing suspicious there, so could you please attach one of your references, too:
/ebio/abt3_projects/databases/simulated_metagenomes/source_genomes/1302858.fasta
Sorry for the delay. Here's the files. Thanks again for helping with this issue!
1302858.fasta.zip metabat2_low_PE-003-contigs_broken.coords_tmp.zip
Hmm, according to the metaquast.log
, Quast crashed on trying to parse 476978:53
as an integer value. And according to the log, Quast was parsing the output for reference 1302858
located in raw_coords_fpath='/ebio/abt3_projects/vadinCA11/data/metagenome/si...put/metabat2_low_PE-003-contigs_broken.coords_tmp'
. We looked into this file and there is no 476978:53
inside it! (Only 476978
is present there, followed by the tab sign which is correctly parsed).
Could you please run grep -r "476978:53" /ebio/abt3_projects/vadinCA11/data/metagenome/simulated_metagenomes/shallow_sequencing/llmga/bin_refine/DAS_Tool/metaquast/runs_per_reference/1302858/contigs_reports/minimap_output/
to check whether this error-causing fragment is present in some other minimap output file? If you find something, please attach the corresponding file here.
I suggest that the entire issue could be due to input/output issue/bug on your machine. To check that, you could rerun absolutely the same command and check whether this error appears again or not. Note that MetaQuast will reuse already processed stages of the pipeline, so rerun should take less time than the original run. Let us how it is going!
I ran grep -r "476978:53" /ebio/abt3_projects/vadinCA11/data/metagenome/simulated_metagenomes/shallow_sequencing/llmga/bin_refine/DAS_Tool/metaquast/runs_per_reference/1302858/contigs_reports/minimap_output/
, and there were no hits.
I did get 2 hits when running the following:
$ grep -r "476978" /ebio/abt3_projects/vadinCA11/data/metagenome/simulated_metagenomes/shallow_sequencing/llmga/bin_refine/DAS_Tool/metaquast/runs_per_reference/1302858/contigs_reports/minimap_output/
/ebio/abt3_projects/vadinCA11/data/metagenome/simulated_metagenomes/shallow_sequencing/llmga/bin_refine/DAS_Tool/metaquast/runs_per_reference/1302858/contigs_reports/minimap_output/metabat2_low_PE-003-contigs_broken.coords_tmp:coassemble_21386_1 2631 0 2603 + 1302858_1302858.PRJNA192621.CP006647 907294 476978 479581 2568 2603 60 NM:i:35 ms:i:2393 AS:i:2393 nn:i:0 tp:A:P cm:i:239 s1:i:2331 s2:i:0 dv:f:0.0052 cg:Z:2603M cs:Z::821*tg:1*at*ag:2*ag:2*ag:9*ga:6*tc:18*ga:2*ct:3*ct:41*tc:24*tc:3*ga:4*ct*ct:2*tc:9*ag:22*cg:4*ct:1416*ga:5*ga:18*gt*ct:8*ag:7*tg:10*ag:11*tc*ga:46*ga:32*tg:11*ct:7*ag:1*ag*ga:3*gt:20
/ebio/abt3_projects/vadinCA11/data/metagenome/simulated_metagenomes/shallow_sequencing/llmga/bin_refine/DAS_Tool/metaquast/runs_per_reference/1302858/contigs_reports/minimap_output/metabat2_low_PE-003-contigs.coords_tmp:coassemble_21386 2631 0 2603 + 1302858_1302858.PRJNA192621.CP006647 907294 476978 479581 2568 2603 60 NM:i:35ms:i:2393 AS:i:2393 nn:i:0 tp:A:P cm:i:239 s1:i:2331 s2:i:0 dv:f:0.0052 cg:Z:2603M cs:Z::821*tg:1*at*ag:2*ag:2*ag:9*ga:6*tc:18*ga:2*ct:3*ct:41*tc:24*tc:3*ga:4*ct*ct:2*tc:9*ag:22*cg:4*ct:1416*ga:5*ga:18*gt*ct:8*ag:7*tg:10*ag:11*tc*ga:46*ga:32*tg:11*ct:7*ag:1*ag*ga:3*gt:20
When I re-ran metaquast, it did complete successfully. However, the resulting report.html
file does not show any tables/plots. Maybe that's due to the high number of reference genomes.
I did get 2 hits when running the following:
"476978" itself is fine, so nothing bad if it is present somewhere, the problem is with "number:number" pattern which cannot be parsed as an integer value.
When I re-ran metaquast, it did complete successfully.
That is great! So my suggestion that the original issue was due to a temporary I/O issue looks likely to be true.
However, the resulting report.html file does not show any tables/plots. Maybe that's due to the high number of reference genomes.
This looks like a separate issue. You have a really huge number of references but MetaQuast should still able to process them correctly. Could you please attach report.html
and metaquast.log
files here?
The other reports (eg., combined_reference/report.html
) do show the data when I view them in the Chrome browser. It just seems to be the main report.html
file. The main report file and the log are attached.
We found the cause of the problem, finally! It is a known bug of 5.0.0 that is fixed here and will be available since 5.0.1 (planned for this week). Sorry, I completely forgot about this fix and didn't catch that it is related to your issue, too.
The issue occurs only in MetaQUAST and only when using --split-scaffolds
option. This is due to not fully-correct renaming of --scaffolds
to --split-scaffolds
in v.5.0.0. There is a simple workaround for the issue -- just use a short version of this option (-s
) in your command! It remains the same in both v.4. and v.5.. Please rerun your assessment again and everything should be fine this time (you can use the same output dir to reuse already generated stuff and speed up the overall evaluation).
Note that the minimap2 issue that you originally reported is also caused by this --split-scaffolds
problem. Also note that you probably don't need to use --split-scaffolds/-s
anymore, since starting from v.5. we report scaffold gap misassemblies
and other scaffold-related metrics always when we see stretches of N's in input assemblies. In v.4. they were calculated only if the corresponding option is specified. Thus, the only thing that --split-scaffolds/-s
is now doing is adding "_broken" versions of the input assemblies to the evaluation.
Thanks for figuring out the issue! I did try re-running with -s
instead of --split-scaffolds
and it seemed to complete successfully
Great to hear that!
I'm using
quast 5.0.0 py27pl526ha92aebf_1 bioconda
I get the following error when running
metaquast.py
on my metagenome:I'm using ~800 reference genomes.
The log of the run is attached: metaquast.log