bacpop / ggCaller

Bifrost graph gene caller.
MIT License
86 stars 6 forks source link

Should I avoid including Sccafold in my input data? #17

Closed HirokiK0 closed 8 months ago

HirokiK0 commented 9 months ago

Thank you for sharing your solution to the error the other day!

I have a question about using the software. It was explicitly stated in the document that arrays with NNNNN mixed in the input data should be avoided. I am currently considering getting a comprehensive genome of a certain species from GenBank and doing an analysis. This data will naturally include scaffold data. I have considered removing these data, but given the number of data, I would like to use them.

I have two questions here. One is whether it is safe to use such scaffold data? The other is, if the Scaffold data are undesirable, is there a good way to get these genome data back into Contig?

samhorsfield96 commented 9 months ago

Hi, the issue with assemblies containing Ns is that Bifrost drops the k-mers containing Ns, resulting in a disjointed graph which can impact synteny-based clustering. We therefore recommend remove these assemblies from datasets prior to running ggCaller. However, you could running ggCaller with and without the genomes with Ns and see whether this impacts results. A small number of Ns is unlikely to impact results greatly.

samhorsfield96 commented 8 months ago

Closing due to inactivity.