bcgsc / ntSynt

Detecting multi-genome synteny using minimizer graph mapping
GNU General Public License v3.0
71 stars 1 forks source link

performance on distantly related species? #55

Open xiaoyezao opened 3 weeks ago

xiaoyezao commented 3 weeks ago

Hi ntSynt developer,

If I understand correctly, ntSynt uses k-mers to infer homology between genomes. I could be mistaken, but it seems that ntSynt works well on simple and closely related genomes, while its effectiveness may be lower with complex or distantly related genomes, such as plant genomes.

Thank you,

Tao

lcoombe commented 3 weeks ago

Hi Tao,

K-mer minimizer sketches are used to compute multi-genome mappings between the input genomes in order to compute the synteny. That allows ntSynt to work on multiple genomes with a range of divergences - in our preprint, we show that ntSynt works well even when comparing human, mouse and rat genomes, which have ~15% sequence divergence (as estimated by Mash). We have also had success with ntSynt with comparing genomes from diverse genera (ex. hoverflies, bees, etc). So, we have seen ntSynt works well comparing multiple divergent genomes, particularly compared to existing tools (again see our preprint for more details). However, of course, it will depend on how divergent the genomes are and how many genomes are being compared - of course like any tool, there would be a point at which the genomes being compared could be too divergent to be able to achieve good synteny block coverage. Tuning some of the parameters could help in those cases.

Let me know if you have any other questions - thank you for your interest in ntSynt! Lauren