ANGSD / angsd

Program for analysing NGS data.
228 stars 50 forks source link

Confidence intervals for FST - significance of FST #157

Open clairemerot opened 6 years ago

clairemerot commented 6 years ago

Dear users of angsd, Has any of you tried to get confidence intervals for the FST values given by Angsd?

I think that one way of doing that is to bootstrap over loci, which might be possible using the SFS output, but I was wondering whether there would be more elegant solutions?

Have you considered also any way of assessing whether those FST represent differentiation or not? I guess one option is looking if 0 is within the confidence interval, but I expect that with many SNP - whole-genome, confidence intervals will be very small. Thus, I was also wondering about making a null distribution of pairwise FSt by randomly drawing samples from two populations.. although that means re-running everything from saf-sfs etc and will be very demanding in time and ressources.

I have very low-coverage data so all analysis that cna be done with the angsd framework is better than exporting a matrix of putative genotype in other softwares.

Thank you for any input on those questions Claire

yzongzjnu commented 5 years ago

I encountered similar conditions with you. I tried the argument "bootstrap" for realSFS when I calculated Fst. There was no differences between the output with "bootstrap" and not. Is there a way to detect whether Fst output by realSFS signigicant or not? What I am doing is just setting a threshold value arbitrarily. Fst which are higher than the threshold are treated as potential outliers. Maybe you have found a better way to do that. I appreciate that if you could share your approch. Thank you!

jpcolella commented 4 years ago

I too would like to determine the significance (pvalue? confidence interval?) of angsd-output Fst values (from sliding window analysis) - did you ever find a solution to this?