Open Huyuxi08 opened 3 years ago
This is expected if there are no singletons in your data (or SNPs with N-1 mutations, N is the sample size). You can use the -c parameter to extend the "edges" of the U-shape expected SFS used for mu_sfs. Try for example -c 3 or -c 5. You can also use SweeD to generate the SFS to see whether indeed there are no singletons in your data.
It is indeed caused by the lack of singletons, because I incorrectly filtered out low frequency sites. Thanks so much !
Hi @alachins ,
First, let me thank you for this great software. I am having the same issue as @Cynthial0l when running RAiDS without the -c parameter. In my case there are certainly singletons in my vcf but the lines used for the variant calling and downstream analysis are isogenic/highly homozygous lines, and therefore there are no heterozygous sites in my data and I believe this is the reason why I obtain the same μ_SFS value for every single position. Do you think that running RAiDS with the -c parameter would be appropriate for my data? Would the results be reliable in that case? And how can I determine which is the most appropriate value for the -c parameter in my case? Sorry to bother you with so many questions.
If there are no heterozygous sites in your data, you will need to increase the -c parameter (default 1) to allow RAiSD to recognize the singletons. Your results will be reliable. There is no mechanism in place at the moment to determine what the best -c parameter value is. To only include singletons in the case of highly homozygous lines you can provide (with -c) the same number as the ploidy. Note that RAiSD uses -c differently than the -y parameter that also specifies the ploidy.
On Thu, Jun 17, 2021 at 1:41 PM biolevol @.***> wrote:
Hi,
First, let me thank you for this great software. I am having the same issue as @Cynthial0l https://github.com/Cynthial0l when running RAiDS without the -c parameter. In my case there are certainly singletons in my vcf but the lines used for the variant calling and downstream analysis are isogenic/highly homozygous lines, and therefore there are no heterozygous sites in my data and I believe this is the reason why I obtain the same μ_SFS value for every single position. Do you think that running RAiDS with the -c parameter would be appropriate for my data? Would the results be reliable in that case? And how can I determine which is the most appropriate value for the -c parameter in my case? Sorry to bother you with so many questions.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/alachins/raisd/issues/25#issuecomment-863167913, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALKWCUFKU2NGEQ2R6PRTJ3TTHNQJANCNFSM4WCBCOPA .
-- Nikolaos Alachiotis
Dear Nikos,
Thank you very much for your quick reply! I have run several trials using different values for the -c parameter (ranging from 2 to 6) and the results do not seem to vary greatly (at least the overall pattern of the peaks is very similar). Thank you again!
Hi, I'm trying to use RAISD on my WGS data and the program runs smoothly. But I have run into an issue that the values of the μ_SFS are consistent along the genome (μ_SFS = 1.039e-11). I got this plot using the command: RAiSD -n wes -I hm.wes.vcf -w 50 -D -R -a 123 -P
RAiSD_Plot.wes.Lachesis_group3.pdf
I am not sure what causes it. Any help would be appreciated!