ANGSD / angsd

Program for analysing NGS data.
230 stars 50 forks source link

cant reconstruct sliding window Fst value #112

Closed MichelMoser closed 7 years ago

MichelMoser commented 7 years ago

Hello,

I try to reconstruct how sliding window calculation of FST by realSFS is done in ANGSD as i have to manually calculate sliding windows for subset of total sites in R.

normal output of Fst using 2 populations looks like follows:

(6027,17052)(22646,69611)(20000,70000)   chr1 45000  11027 0.088167
 (8681,17877)(30083,74498)(30000,80000)   chr1 55000   9198 0.092517
(8963,20245)(42315,88037)(40000,90000)   chr1 65000  11284 0.112481

i assume realSFS calculates the mean weighted FST (4th column i guess) within a given range (50Kb) of the chromosome. So summing up single site FSTs and divide by its total count.

I tried to reconstruct Fst value calculated for the first window in R but got different Fst values from what is reported: realSFS: 0.094536 own calculation: 0.006428877

Could you tell me what i am missing here?

(5917,14201)(13711,58617)(10000,60000)   chr1 35000   8286 0.094536

window-start = 10000
window-end = 60000

head(recon)
chrchr1 311     0.000000        0.000016
chrchr1 312     0.000000        0.000015
chrchr1 313     0.000000        0.000018
chrchr1 314     0.000000        0.000016
chrchr1 315     0.000000        0.000015
chrchr1 316     0.000000        0.000013
chrchr1 317     0.000000        0.000013
chrchr1 318     0.006853        0.084749
chrchr1 319     0.000000        0.000014

nrow(recon)
[1] 8285
> mean(recon$V4)
[1] 0.006428877

Also could you elaborate what the first column in the sliding window Fst output means? specially the number in the first two brackets?
(5917,14201)(13711,58617)(10000,60000) May be the answer i look for is already in there and i took a wrong window to comput mean Fst....

(5917,14201)(13711,58617)(10000,60000) chr1 35000 8286 0.094536

Thank you, Michel

mfumagalli commented 7 years ago

Hi,

the weighted FST is not "summing up single site FSTs and divide by its total count". On multiple loci, FST is calculated as the ration between the sum across of all sites of the genetic variance between populations and the sum across all sites of the total genetic variance. So it's different from the simple average.

Best

Matteo

On 11 October 2017 at 09:19, MichelMoser notifications@github.com wrote:

Hello,

I try to reconstruct how sliding window calculation of FST by realSFS is done in ANGSD as i have to manually calculate sliding windows for subset of total sites in R.

normal output of Fst using 2 populations looks like follows:

(6027,17052)(22646,69611)(20000,70000) chr1 45000 11027 0.088167 (8681,17877)(30083,74498)(30000,80000) chr1 55000 9198 0.092517 (8963,20245)(42315,88037)(40000,90000) chr1 65000 11284 0.112481

i assume realSFS calculates the mean weighted FST (4th column i guess) within a given range (50Kb) of the chromosome. So summing up single site FSTs and divide by its total count.

I tried to reconstruct Fst value calculated for the first window in R but got different Fst values from what is reported: realSFS: 0.094536 own calculation: 0.006428877

Could you tell me what i am missing here?

(5917,14201)(13711,58617)(10000,60000) chr1 35000 8286 0.094536

window-start = 10000 window-end = 60000

head(recon) chrchr1 311 0.000000 0.000016 chrchr1 312 0.000000 0.000015 chrchr1 313 0.000000 0.000018 chrchr1 314 0.000000 0.000016 chrchr1 315 0.000000 0.000015 chrchr1 316 0.000000 0.000013 chrchr1 317 0.000000 0.000013 chrchr1 318 0.006853 0.084749 chrchr1 319 0.000000 0.000014

nrow(recon) [1] 8285

mean(recon$V4) [1] 0.006428877

Also could you elaborate what the first column in the sliding window Fst output means? specially the number in the first two brackets? (5917,14201)(13711,58617)(10000,60000) May be the answer i look for is already in there and i took a wrong window to comput mean Fst....

(5917,14201)(13711,58617)(10000,60000) chr1 35000 8286 0.094536

Thank you, Michel

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ANGSD/angsd/issues/112, or mute the thread https://github.com/notifications/unsubscribe-auth/ACGvCZ9zQK2Rz5XwX3-OMsttXh1LjQMtks5srHoEgaJpZM4P1FRZ .

MichelMoser commented 7 years ago

Thank you for clarfying this and the quick answer. And sorry for my ignorance, as it is actually stated in your tutorials https://github.com/mfumagalli/ngsTools/blob/master/TUTORIAL.md

Best Michel

Short follow-up question:

Is this also true for Dxy values in sliding windows?