Closed MichelMoser closed 7 years ago
Hi,
the weighted FST is not "summing up single site FSTs and divide by its total count". On multiple loci, FST is calculated as the ration between the sum across of all sites of the genetic variance between populations and the sum across all sites of the total genetic variance. So it's different from the simple average.
Best
Matteo
On 11 October 2017 at 09:19, MichelMoser notifications@github.com wrote:
Hello,
I try to reconstruct how sliding window calculation of FST by realSFS is done in ANGSD as i have to manually calculate sliding windows for subset of total sites in R.
normal output of Fst using 2 populations looks like follows:
(6027,17052)(22646,69611)(20000,70000) chr1 45000 11027 0.088167 (8681,17877)(30083,74498)(30000,80000) chr1 55000 9198 0.092517 (8963,20245)(42315,88037)(40000,90000) chr1 65000 11284 0.112481
i assume realSFS calculates the mean weighted FST (4th column i guess) within a given range (50Kb) of the chromosome. So summing up single site FSTs and divide by its total count.
I tried to reconstruct Fst value calculated for the first window in R but got different Fst values from what is reported: realSFS: 0.094536 own calculation: 0.006428877
Could you tell me what i am missing here?
(5917,14201)(13711,58617)(10000,60000) chr1 35000 8286 0.094536
window-start = 10000 window-end = 60000
head(recon) chrchr1 311 0.000000 0.000016 chrchr1 312 0.000000 0.000015 chrchr1 313 0.000000 0.000018 chrchr1 314 0.000000 0.000016 chrchr1 315 0.000000 0.000015 chrchr1 316 0.000000 0.000013 chrchr1 317 0.000000 0.000013 chrchr1 318 0.006853 0.084749 chrchr1 319 0.000000 0.000014
nrow(recon) [1] 8285
mean(recon$V4) [1] 0.006428877
Also could you elaborate what the first column in the sliding window Fst output means? specially the number in the first two brackets? (5917,14201)(13711,58617)(10000,60000) May be the answer i look for is already in there and i took a wrong window to comput mean Fst....
(5917,14201)(13711,58617)(10000,60000) chr1 35000 8286 0.094536
Thank you, Michel
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ANGSD/angsd/issues/112, or mute the thread https://github.com/notifications/unsubscribe-auth/ACGvCZ9zQK2Rz5XwX3-OMsttXh1LjQMtks5srHoEgaJpZM4P1FRZ .
Thank you for clarfying this and the quick answer. And sorry for my ignorance, as it is actually stated in your tutorials https://github.com/mfumagalli/ngsTools/blob/master/TUTORIAL.md
Best Michel
Short follow-up question:
Is this also true for Dxy values in sliding windows?
Hello,
I try to reconstruct how sliding window calculation of FST by realSFS is done in ANGSD as i have to manually calculate sliding windows for subset of total sites in R.
normal output of Fst using 2 populations looks like follows:
i assume realSFS calculates the mean weighted FST (4th column i guess) within a given range (50Kb) of the chromosome. So summing up single site FSTs and divide by its total count.
I tried to reconstruct Fst value calculated for the first window in R but got different Fst values from what is reported: realSFS: 0.094536 own calculation: 0.006428877
Could you tell me what i am missing here?
Also could you elaborate what the first column in the sliding window Fst output means? specially the number in the first two brackets?
(5917,14201)(13711,58617)(10000,60000) May be the answer i look for is already in there and i took a wrong window to comput mean Fst....
(5917,14201)(13711,58617)(10000,60000) chr1 35000 8286 0.094536
Thank you, Michel