Open jakobnissen opened 1 year ago
Thanks for using metaMDBG!
Unfortunatelly this is a limitation of the multi-k approach, it doesn't produce a nice assembly graph. The only thing that you can do is to generate intermediate graphs with the "gfa" command, this command provide the path of the final contigs in the generated graph.
I won't presume to know more about the inner works of assemblers than you, but I think SPAdes uses multiple k-mers and still produce a graph. In SPAdes, I believe the information from smaller k-mers is used to construct the graph of larger k-mers, and the final contigs are extracted from the final graph only. Therefore, every output contig can be found in the final graph. If I understand your preprint correctly, something similar happens in metaMDBG: The mContigs are extracted from the final graph with the largest k-mer size, after which each mContig is converted to contigs. So, isn't there are 1-to-1 correspondance between paths in the final graph (consisting of multiple unitigs), mContigs, and contigs?
The final contigs are extracted from thousand of variants of the assembly graph with different properties (overlap size, abundance filter etc). I can't provide all of them, so the gfa command generate the initial one. The only thing you can do then is to use the unitig -> contig mapping to study contig relationship.
I see. That's unfortunate for us, but we'll see what we can do. Thank you for explaining.
@GaetanBenoitDev , with rust-mdbg
there is a secondary tool, magic_simplify_meta
, that is used to generate the overall graph. Mentioned here and is also mentioned in the recent binning publication from Heng Li. So there is some precedent for this at least with rust-mdbg
.
@GaetanBenoitDev , with
rust-mdbg
there is a secondary tool,magic_simplify_meta
, that is used to generate the overall graph. Mentioned here and is also mentioned in the recent binning publication from Heng Li. So there is some precedent for this at least withrust-mdbg
.
For others coming along to this and seeing the above. Looking the above script over, it's not immediately apparent how magic_simplify_meta
is generating a new GFA from multiple steps/GFA's. It does appear to be generated this from a single GFA instead, likely one produced by rust-mdbg
and not generated here.
Hi, I think you should just use the contig -> unitig mapping information, metaflye also use this system. Rust-mdbg can easily provide this graph because it is a very simple assembler. Depending on the methods used by assemblers to generate the contigs, it can be hard to provide such graph.
There are also thing which are hard to represent, lets say I have solved a species in single contig but it has strain variability. It's more natural to preserve that variability in the graph (thus a fragmented graph even for the solved strain), and represent the solved contig as a path in this graph.
Thanks for creating such a nice assembler.
In the research we're doing, we're looking to exploit the relationship between contigs in the assembly graph. Several other tools, such as
Recycler
,GraphBin
,metaplasmidSPAdes
,SCAPP
andCCVAE
are, like us, extracting information from the relationship between contigs in the graph. The graph is a rich source of information, and I believe more tools will be working on assembly graphs in the future.We realize that
hifiasm-meta
does produce a graph and so can use that, but we believe metaMDBG produces better contigs thanhifiasm-meta
. As mentioned in #2 , metaMDBG can produce a graph, but there is currently no way for the user to relate the graph with the actual output contigs.Would it be possible to add functionality in metaMDBG, such that a graph of the final contigs can be produced?