Open xxllgg opened 6 days ago
Thank you for your message and interest in ntSynt, Xiaolong.
Initially, I would recommend that you run ntSynt between a pair of conserved chromosome-level assemblies and slowly scale from there, adjusting the parameters -- and eventually performing a systematic and broad parameter sweep (while staying within the prescribed range indicated in our preprint).
In our online preprint supplementary data, we posted initial guidelines for comparing genomes with a broad range of sequence divergence (Table S14 https://www.biorxiv.org/content/biorxiv/early/2024/02/13/2024.02.07.579356/DC1/embed/media-1.pdf?download=true)
Range 1% - 10% : --block_size 1000 --indel 50000 --merge 100000 --w_rounds 250 100
That could be a starting point, but of course the characteristics and particulars of your genomes being compared will inform how you set the parameters going forward, and a sweep is recommended.
FYI -- The developer of ntSynt is currently on vacation, returning next week.
Hi Xiaolong,
Indeed, when you start to get more and more input assemblies with higher divergence, it can start to be difficult to detect synteny blocks. I am continuing to look into the best parameterizations of ntSynt for these cases.
A few notes/suggestions:
k
and w
are the window size for computing minimizers (a selected subset of k-mers), which are used for the multi-genome mapping. These can be good parameters to try different values for your experiment
k
(for example in the range of 18-32)w
, but keep under 1000 (ex. 200-1000). Note that the very high values of w
that you used in your second test are not recommended, as that will essentially provide a very, very sparse sketch. I have not done any runs with -w
higher than 1500.--block_size
, --merge
and --indel
parameters. Lowering the first will output shorter synteny blocks, and increasing the latter 2 will lead to any found synteny blocks being extendedI also do second Rene's suggestion of starting with fewer assemblies first to work out what parameters are looking good, and get a sense of the synteny between the more/less contiguous assemblies, and scaling up from there.
Thank you for your interest in ntSynt! Lauren
Hi there, Thank you for developing this amazing tool! I am using ntSynt for detecting synteny blocks among multiple plant genomes(>10 species) that belonging to one genus. Some assemblies are not good with shorter contig N50(~100kb), others are good, but all of them are chromosome-level. The max sequence divergence is ~7%, the min is ~1%. I used the
-d 7
parameter, the results showed that there is no synteny path for some chromosomes. Then, I changed the parameters to-d 7 -k 25 -w 200 --block_size 500 --indel 50000 --merge 1000000 --w_rounds 100 50
, and-d 7 -k 25 -w 10000 --block_size 1000 --indel 50000 --merge 1000000 --w_rounds 5000 1000
. These results are even worse than the-d 7
. How to set -w and other parameters to get syneny paths? Could you give me some advice for how to get a better result in my case (all of chromosomes should have some synteny blocks)? Sincerely, Xiaolong