μ_SFS values are consistent along the genome

Huyuxi08 commented 3 years ago

Hi, I'm trying to use RAISD on my WGS data and the program runs smoothly. But I have run into an issue that the values of the μ_SFS are consistent along the genome (μ_SFS = 1.039e-11). I got this plot using the command: RAiSD -n wes -I hm.wes.vcf -w 50 -D -R -a 123 -P

RAiSD_Plot.wes.Lachesis_group3.pdf

I am not sure what causes it. Any help would be appreciated!

alachins commented 3 years ago

This is expected if there are no singletons in your data (or SNPs with N-1 mutations, N is the sample size). You can use the -c parameter to extend the "edges" of the U-shape expected SFS used for mu_sfs. Try for example -c 3 or -c 5. You can also use SweeD to generate the SFS to see whether indeed there are no singletons in your data.

Huyuxi08 commented 3 years ago

It is indeed caused by the lack of singletons, because I incorrectly filtered out low frequency sites. Thanks so much !

biolevol commented 3 years ago

Hi @alachins ,

First, let me thank you for this great software. I am having the same issue as @Cynthial0l when running RAiDS without the -c parameter. In my case there are certainly singletons in my vcf but the lines used for the variant calling and downstream analysis are isogenic/highly homozygous lines, and therefore there are no heterozygous sites in my data and I believe this is the reason why I obtain the same μ_SFS value for every single position. Do you think that running RAiDS with the -c parameter would be appropriate for my data? Would the results be reliable in that case? And how can I determine which is the most appropriate value for the -c parameter in my case? Sorry to bother you with so many questions.

alachins commented 3 years ago

If there are no heterozygous sites in your data, you will need to increase the -c parameter (default 1) to allow RAiSD to recognize the singletons. Your results will be reliable. There is no mechanism in place at the moment to determine what the best -c parameter value is. To only include singletons in the case of highly homozygous lines you can provide (with -c) the same number as the ploidy. Note that RAiSD uses -c differently than the -y parameter that also specifies the ploidy.

On Thu, Jun 17, 2021 at 1:41 PM biolevol @.***> wrote:

Hi,

First, let me thank you for this great software. I am having the same issue as @Cynthial0l https://github.com/Cynthial0l when running RAiDS without the -c parameter. In my case there are certainly singletons in my vcf but the lines used for the variant calling and downstream analysis are isogenic/highly homozygous lines, and therefore there are no heterozygous sites in my data and I believe this is the reason why I obtain the same μ_SFS value for every single position. Do you think that running RAiDS with the -c parameter would be appropriate for my data? Would the results be reliable in that case? And how can I determine which is the most appropriate value for the -c parameter in my case? Sorry to bother you with so many questions.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/alachins/raisd/issues/25#issuecomment-863167913, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALKWCUFKU2NGEQ2R6PRTJ3TTHNQJANCNFSM4WCBCOPA .

-- Nikolaos Alachiotis

biolevol commented 3 years ago

Dear Nikos,

Thank you very much for your quick reply! I have run several trials using different values for the -c parameter (ranging from 2 to 6) and the results do not seem to vary greatly (at least the overall pattern of the peaks is very similar). Thank you again!

alachins / raisd

μ_SFS values are consistent along the genome #25