Closed OmonkeyGOD closed 1 year ago
Greetings, Yue!
Thanks so much! Chase
Hi Chase,
Thank you so much for your reply! That helps a lot. I have attached the example files for question 3. Meanwhile, I still have some following concerns regarding questions 1 & 2.
perl snpgenie_between_group.pl --gtf_file=chrom.gtf --num_bootstraps=100
already contains dN/dS for each gene. Can I use the dN/dS values to infer whether the genes are under positive selection without directly? Since I didn't do sliding window analysis, I am wondering whether sliding window analysis is necessary?Rscript SNPGenie_sliding_windows.R between_group_codon_results_onegene.txt N S 100 1 1000 40 NONE 6 > onegene.out
. I tried it on the two files below and got errors:
between_group_codon_results_onegene.txt
between_group_codon_results.txtThanks in advance.
Thank you!
Detecting positive selection is a difficult problem with lots of pitfalls and debate. Statistically significant dN/dS > 1 is one piece of evidence suggestive of positive selection, but other issues (alignment error, stochastic error, etc.) can also cause it. Thus, dN/dS alone can be thought of as a mere candidate generator. Moreover, it usually doesn't make sense for a whole GENE to be under positive selection — if it were, then the entire sequence would be rapidly scrambled, maybe not even alignable. Instead, certain sites or small regions within genes may be under positive selection against a backdrop of purifying selection. This is why sliding windows can be so helpful. I recommend reading background and studies that address similar situations to yours to see what others may have done in similar circumstances.
Again, I'm not sure how to understand a whole gene being under positive selection. Certainly a whole gene can be under purifying selection, causing overall low variation, while one codon might be under positive selection. This situation may or may not be possible/impossible to detect (i.e., differentiate from chance), depending on e.g., dS. What is your biological system, such that you expect an entire gene's length to be under positive selection?
Thanks for the example! I have fixed the R version bugs that were leading to this problem; please download and use the new script. Additionally, please note that requiring a minimum codon count of 40 will eliminate ALL the input data from consideration, resulting in NAs, because your number of defined codons in group 2 is only 18. Thus, I suggest taking some time to consider which parameter values might make sense for your data. For example, you may also wish to reconsider your sliding window size of 100 codons.
Let me know if that helps! Chase
Hi Chase,
Thank you very much for your reply and updated script. Your explanation helped a lot.
Yue
Hello,
Thank you for creating this program. I am using it to calculate the dN/dS of all genes between populations. I ran snpgenie_between_group.pl for each chromosome and generated the following files between_group_codon_results.txt and between_group_product_results.txt. And I saw in the manual you suggested running SNPGenie_sliding_windows.R after. I have the following questions:
Looking forward to your reply. Thank you very much! Yue