Open zactobias44 opened 2 years ago
Sorry for the late reply. I really like the detailed description of your issue.
As you mentioned, the coverage is really low, using join_spades_fastg_by_blast.py
is one typical choice if you still want a circular sequence topology with gaps. However, the join_spades_fastg_by_blast.py
is currently immatured and not smart enough. It will adds all potential gaps/overlap edges that it roughly detected from blasting-to-ref. So we need to do some manual processing on the graph.
For example, as you can see from the attached figure (Bandage-visualized version of slimmed_assembly_graph.fastg.txt), there will be 692
or 691-435-693
between edge 13
and 683
. The 692
is thus redundant because 691-435-693
has the informative edge 435
which is from the original assembly.
Except for comparing the parallel paths, another criterion for removing redundant edge is identifying the unrealistic edges. For example, edge 686
is indicating there might be a 1492-bp overlap between 463
and 445
, which is insane because neither 463
and 445
has that sequence length to overlap. So we remove 686
. # This should be definitely scheduled to be improved/fixed in join_spades_fastg_by_blast.py
. I would like to keep this issue open until fixed. @wbyu
Following the above two criteria, we can remove all edges in red and achieve the circular sequence with gaps/overlaps.
Please let me know if you have further questions.
Thanks for describing this solution! I'll go ahead and process these manually. Question about removing edges. Do you just go into the fastg file and remove the entries for the edges in question? And then run get_organelle_from_assembly.py
with that edited fastg as input?
Thanks for describing this solution! I'll go ahead and process these manually. Question about removing edges. Do you just go into the fastg file and remove the entries for the edges in question? And then run
get_organelle_from_assembly.py
with that edited fastg as input?
You may do that. But I usually do it in Bandage, which could save the edited graph as a gfa file. And then run get_organelle_from_assembly.py
with that gfa file.
Oh okay great. Thank you!
First, let me just say how great this software is! I see great improvement over other mitochondrial genome assembly tools.
Now for the issue. I have pretty low coverage (average coverage < 20, average base-coverage ~60) for many of my samples, so I end up with a lot of incomplete assemblies with 3-8 scaffolds. I am trying to insert gaps (using the approach in FAQ) for downstream analysis using a published mitogenome for the same species, but am running into issues.
join_spades_fastg_by_blast.py appears to be working just fine, it is the following command that I am struggling with:
get_organelle_from_assembly.py -g extended*Bs_mito_sc6ab.label.fastg.Ncontigs_added.fastg --genes /vortexfs1/home/ztobias/BotryllusMito/metadata/Bs_mito_sc6ab.label.fasta -t 8 -o out -F anonym --expected-min-size 1450 --expected-max-size 1550 --overwrite
The first type of error I get is the following (from get_org.log.txt)
I am not sure why this fails, as it works for many other samples. I have attached the get_org.log.txt and post-slimming graph (renamed with txt extension b/c github won't accept fastq) get_org.log.txt slimmed_assembly_graph.fastg.txt
I am also receiving a different type of error and will post in a separate issue.
Thanks!