bcgsc / ntSynt

Detecting multi-genome synteny using minimizer graph mapping
GNU General Public License v3.0
61 stars 1 forks source link

How to adjust the -k and -w for multiple genome synteny analysis? #47

Open xxllgg opened 6 days ago

xxllgg commented 6 days ago

Hi there, Thank you for developing this amazing tool! I am using ntSynt for detecting synteny blocks among multiple plant genomes(>10 species) that belonging to one genus. Some assemblies are not good with shorter contig N50(~100kb), others are good, but all of them are chromosome-level. The max sequence divergence is ~7%, the min is ~1%. I used the -d 7 parameter, the results showed that there is no synteny path for some chromosomes. Then, I changed the parameters to -d 7 -k 25 -w 200 --block_size 500 --indel 50000 --merge 1000000 --w_rounds 100 50, and -d 7 -k 25 -w 10000 --block_size 1000 --indel 50000 --merge 1000000 --w_rounds 5000 1000. These results are even worse than the -d 7. How to set -w and other parameters to get syneny paths? Could you give me some advice for how to get a better result in my case (all of chromosomes should have some synteny blocks)? Sincerely, Xiaolong

warrenlr commented 5 days ago

Thank you for your message and interest in ntSynt, Xiaolong.

Initially, I would recommend that you run ntSynt between a pair of conserved chromosome-level assemblies and slowly scale from there, adjusting the parameters -- and eventually performing a systematic and broad parameter sweep (while staying within the prescribed range indicated in our preprint).

In our online preprint supplementary data, we posted initial guidelines for comparing genomes with a broad range of sequence divergence (Table S14 https://www.biorxiv.org/content/biorxiv/early/2024/02/13/2024.02.07.579356/DC1/embed/media-1.pdf?download=true) Range 1% - 10% : --block_size 1000 --indel 50000 --merge 100000 --w_rounds 250 100 That could be a starting point, but of course the characteristics and particulars of your genomes being compared will inform how you set the parameters going forward, and a sweep is recommended.

FYI -- The developer of ntSynt is currently on vacation, returning next week.

lcoombe commented 7 hours ago

Hi Xiaolong,

Indeed, when you start to get more and more input assemblies with higher divergence, it can start to be difficult to detect synteny blocks. I am continuing to look into the best parameterizations of ntSynt for these cases.

A few notes/suggestions:

I also do second Rene's suggestion of starting with fewer assemblies first to work out what parameters are looking good, and get a sense of the synteny between the more/less contiguous assemblies, and scaling up from there.

Thank you for your interest in ntSynt! Lauren