broadinstitute / pyfrost

Python bindings for Bifrost with a NetworkX compatible API
BSD 3-Clause "New" or "Revised" License
27 stars 1 forks source link

Kmer colours #6

Closed LiamGKing closed 3 years ago

LiamGKing commented 3 years ago

Greetings, it's possible this might be more of a Bifrost question but I believe you may be well suited to answer this.

It is mentioned that the colours of k-mers contained in a unitig may not necessarily all share the same colours, so far I have not been able to observe this. What cases would lead to k-mers not sharing all colours with the other k-mers contained in their unitig? I guess my ideal scenario would to be able to discern when there is a consensus of colours contained in a unitig so that I would not have to check each individual k-mer for their colours.

Thank you for you work on this and any input would be appreciated.

GuillaumeHolley commented 3 years ago

I think the most common case is when the k-mers of a genome B complete the gaps of a genome A, causing unitigs to merge. A gap in dBG(A) would be represented by two (unitig) tips due to a lack of coverage in the reads of A. But say B does not have such a lack of coverage for the same sequence, so in B, the sequence is represented as a single unitig. When building cdBG(A,B), the two tips of A "merge" into the unitig of B. If you have a look at the colors of of that unitig, you can see that the extremities of the unitig have both the colors from A and B. But the middle of the unitig only has the color of B.

For reads, this is caused by a lack of coverage. For assemblies, this can be caused by a lot of things: misassemblies, lack of coverage again, scaffolding, etc.

LiamGKing commented 3 years ago

Thanks so much! This was a very thorough explanation.