Open tongyin121 opened 3 weeks ago
Currently, synteny diversity and Syn-FST can only be calculated using genomic data. Syn-FST is derived from synteny diversity. The most effective tool for identifying syntenic and non-syntenic regions is SyRI, which relies on genome alignment. However, no tool currently exists that can identify syntenic regions using third-generation or second-generation sequencing data. In the future, we plan to incorporate sequencing data to calculate synteny diversity.
thank you for your reply, It resolved my confusion.
another question is how can I get the values of windows(maybe 10k) of SynFst ? I think I can reconstruct the SynFst file and then use SynDiv_c window command to get what I want. However I am confused about the States columns that are in .cal file. Could you please tell me if the method is correct and the meanings of the States columns.
The All_States
column represents the total number of pairings in the population, and Syntenic_States
represents the number of syntenic pairings.
For example, if there are three genomes A, B, and C, the possible pairings would be AB, AC, and BC. If only AB is syntenic, then All_States
would be 3, and Syntenic_States
would be 1. Syntenic_Diversity
is calculated as 1 - 1/3.
I am curious about the process of calculating Syn_FST. The .cal files were generated by syri.out and .align files. However, how can we obtain a .cal file from second-generation sequencing data?