Defining Divergence Stratum

Dear Xiuxu,

Many thanks for contacting me and I am very grateful for your feedback.

I hope this helps you to get some details: https://hajkd.github.io/orthologr/articles/divergence_stratigraphy.html .

In brief, a Divergence Stratum is defined as a decile (= 10% quantile) retrieved from all Ka/Ks (or dN/dS) values of all orthologs returned by the pairwise genome comparison.

In other words, imagine having 10000 orthologous genes and their corresponding Ka/Ks values after performing a pairwise genome comparison using the dNdS() function implemented in the orthologr package. Now, these 10000 Ka/Ks values follow a distribution between 0 and +Inf, where Ka/Ks < 1 reflects purifying selection, Ka/Ks = 1 reflects neutral evolution, and Ka/Ks > 1 reflects positive selection (in reality usually the largest Ka/Ks values I have seen are e.g. 100). Next, you bin these 10000 Ka/Ks values according to their 10% quantile (= decile), meaning that the lowest 10% of Ka/Ks values are in decile one (= Divergence Stratum 1), the lowest Ka/Ks values between the 11%-20% quantile are in decile two (= Divergence Stratum 2), ..., and the largest Ka/Ks values between the 91-100% quantile are in decile 10 (= Divergence Stratum 10) (This is what the DivergenceMap() function in the orthologr package does). This way, each Divergence Stratum has (almost) the same number of genes.

In contrast, using phylostratigraphy and the phylostrata categorization may lead to some phylostrata having e.g. 30% of all genes and some other phylostrata have only 1% of all genes. Since this gene number bias isn't corrected in any downstream analysis, I tried to avoid this bias when defining Divergence Strata.

I hope this helps?

Many thanks and best wishes, Hajk

drostlab / myTAI

Defining Divergence Stratum #5