Closed brambloemen closed 4 months ago
SCAPP is not really designed to work with metaflye on long reads but it may be possible so I'd like to see if we can figure out what is wrong. Do you have the intermediate SCAPP files and log files from this run and can you share them? Is it possible to share the fastg that is causing this error as well?
Here are the intermediate files: intermediate_files.tar.gz
I used it quite a lot before on flye assemblies, and until now it has worked quite well on almost all datasets. I'm trying to look into what changed in my pipeline that caused this issue.
OK, good to hear that it worked on assemblies before! The error message shows that there is a divide by zero error where it tries to divide one coverage by the other. Looking in your fastg file (just do a text search for cov_0) I see there are sequences that are said to have 0 coverage. I don't know exactly what this means or why it might be the case in this sample. If you can solve the issue of why there are sequences with 0 coverage in the graph, then SCAPP should work as expected. (Note you could just replace all places it says cov_0.0 with cov_0.0001 and it should work, but it's still worth trying to figure out what is going on in this sample)
Thank you, I was indeed able to fix it by substituting the cov0.0 with cov0.0001 when depth of coverage was 0 in the metaflye_gfa2fastg script. The issue is caused by the way flye polishing works, with no reads being mapped against some short edges after polishing: https://github.com/mikolmogorov/Flye/issues/473
I'm using SCAPP as part of a snakemake workflow, on Flye --meta assemblies of ONT reads, after converting the assembly_graph.gfa to fastg format. For some datasets, it runs perfectly fine, but for others it seems to run into issues and I can't seem to figure out what is the cause.
The snakemake log:
The error given by scapp: