bvilhjal / ldpred

MIT License
96 stars 57 forks source link

LDpred wouldn't work for large sample size small pho #55

Closed wavefancy closed 5 years ago

wavefancy commented 5 years ago

Hi Bjarni J. Vilhja´lmsson,

Thanks much for your support of LDpred.

Can you please help us on one thing that is very hard for us to understand: LDpred works well for GWAS summary estimated from a small sample size (185000) across all pho values. However, when we meta-analysis with new samples, the sample size is 1140845 now. The LDpred only works for large pho, it has no power for small pho. Does the power should always improve as the increasing of sample size? Why does power decrease for small phos?

Below is the R2 for the comparison, large sample size 1140845, not work for small phos.

LDpred_p3.0000e-01  NagelkerkeR2:  num  0.03
LDpred_p1.0000e-01  NagelkerkeR2:  num  0.0282
LDpred-inf          NagelkerkeR2:  num  0.0265
LDpred_p1.0000e+00  NagelkerkeR2:  num  0.0254
LDpred_p3.0000e-02  NagelkerkeR2:  num  0.0186
# no power
LDpred_p1.0000e-02  NagelkerkeR2:  num  0.00155
LDpred_p1.0000e-03  NagelkerkeR2:  num  0.000522
LDpred_p3.0000e-03  NagelkerkeR2:  num  0.000387

Small sample size, 185000, works well for small phos:

# good power for small pho
LDpred_p1.0000e-03  NagelkerkeR2:  num  0.028
LDpred_p3.0000e-03  NagelkerkeR2:  num  0.0249
LDpred_p1.0000e-02  NagelkerkeR2:  num  0.0183
LDpred_p3.0000e-02  NagelkerkeR2:  num  0.0135
LDpred_p1.0000e-01  NagelkerkeR2:  num  0.011
LDpred-inf          NagelkerkeR2:  num  0.0104
LDpred_p3.0000e-01  NagelkerkeR2:  num  0.0104
LDpred_p1.0000e+00  NagelkerkeR2:  num  0.0103

Thanks much for your support. Best regards Wallace

bvilhjal commented 5 years ago

Hi Wallace, I'm worried that the reason may lie in the fact that LDpred relies on a model, for the gibbs sampler, and the gibbs sampler may become unstable with large sample sizes if model assumptions are wrong. Basically, it's less robust to model assumption violations when sample sizes are large.

Are you using the latest version for your analysis?

Best, Bjarni

wavefancy commented 5 years ago

Hi Bjarni,

Thanks much for your quick reply.

Yes, I noticed there's a few warning about the not convergence of gibbs sampler for a few chromosomes. However, when I removed those chrs, there's still no power for small pho.

Sorry for I used the old pip version where I can make it run chr by chr. Does the newer version has improved on the gibbs sampling stability? And also, do you have any suggestion what I should do for a few chrs the gibbs sampler is not convergence, does this will affect prediction power? By the way, does the newer version support chr by chr run?

Thanks so much for your great support.

Best regards Wallace

wavefancy commented 5 years ago

Hi Bjarni,

For case/control study, if the number of cases and controls are in imbalance. Should I use the effective number of samples or just the total of cases and controls?

Best regards Wallace

bvilhjal commented 5 years ago

Hi Wallace,

I'm not sure, it sounds like a good idea to use the effective sample size. I suggest you try both.

Regarding the general issue of training on summary stats from very large samples. I plan to improve on that in the coming months, I'll keep you updated.

Best, Bjarni

wavefancy commented 5 years ago

Hi Bjarni,

Just want to follow up any updates on the large N problem?

Best regards Wallace

hershwin commented 5 years ago

Hi, I am also having a similar problem, my N is 200000 - @wavefancy were you able to find a solution? Thanks for your help!

wavefancy commented 5 years ago

@hershwin , Sorry, I don't have.

bvilhjal commented 5 years ago

I think I've now improved convergence in LDpred when applied to large samples. Another possible improvement can be obtained my excluding regions of long range LD (to come in version 1.10). I'll close this issue now.

Thanks, Bjarni

wavefancy commented 5 years ago

This is so great, thank you so muc!

Best regards Wallace

On Thu, Oct 17, 2019 at 5:24 AM Bjarni J. Vilhjalmsson < notifications@github.com> wrote:

I think I've now improved convergence in LDpred when applied to large samples. Another possible improvement can be obtained my excluding regions of long range LD (to come in version 1.10). I'll close this issue now.

Thanks, Bjarni

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bvilhjal/ldpred/issues/55?email_source=notifications&email_token=AALGO4SCNUWNJ542UR3JNRTQPA4MRA5CNFSM4G2VKIVKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBPTHUI#issuecomment-543110097, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALGO4RMVF2ERHHW23EKOFLQPA4MRANCNFSM4G2VKIVA .

-- Best regards Wallace(Minxian) Wang

Computational Biologist, Khera Lab. Broad Institute of MIT and Harvard 415 Main St, Cambridge, MA 02142

bcrone commented 4 years ago

Hi Bjarni,

Jumping on this issue rather than opening new issue. I'm working with the most recent version of LDPred, and am encountering convergence issues with large N (N = 772555). I'm having difficulty interpreting the underlying issue with the LDPred model in regards to large sample size - could you help explaining this?

bvilhjal commented 4 years ago

Hi Brad, could you please provide more details. What models fail? How large is the heritability? Perhaps even the parameters you used to run coord and gibbs.

bcrone commented 4 years ago

Thanks for the reply. I’m running LDPred v1.0.11. From the Gibbs log: LD radius: 364 Genome-wide estimated heritability: 0.1150 Chi-square lambda: 2.5774 I achieve convergence for causal fractions: f=1, f=.3, fail all others.

On Dec 5, 2019, at 10:26, Bjarni J. Vilhjalmsson notifications@github.com wrote:

Hi Brad, could you please provide more details. What models fail? How large is the heritability? Perhaps even the parameters you used to run LDpred.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/bvilhjal/ldpred/issues/55?email_source=notifications&email_token=AAZ6XKAQVK2HVVLUR5LOCQDQXEMR5A5CNFSM4G2VKIVKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGBCRYY#issuecomment-562178275, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ6XKC3FY2MLM3MSDCXFN3QXEMR5ANCNFSM4G2VKIVA.

bvilhjal commented 4 years ago

Hi Brad,

Thanks a lot for this information, it's really useful for trying to understand how I can improve LDpred. Did you set a sample size parameter, --N (which I wouldn't generally recommend)?

Also, what about the parameters you used for the "coord" step. Do you have those? I'm mostly interested in seeing whether you set a sample size parameter, allele frequency thresholds, etc.

Finally, you could try using --only-hm3 in the coord step.

Best, Bjarni

bcrone commented 4 years ago

Hi Bjarni,

These are the parameters I’m invoking for the coord step: chr, pos, A1/A2, reffreq, eff, se, pval, rs, ncol, eff_type

I tested the recommendation of removing —N parameter from Gibbs step, but still have same convergence issues.

Thanks, Brad

On Dec 6, 2019, at 02:36, Bjarni J. Vilhjalmsson notifications@github.com wrote:

Hi Brad,

Thanks a lot for this information, it's really useful for trying to understand how I can improve LDpred. Did you set a sample size parameter, --N (which I wouldn't generally recommend)?

Also, what about the parameters you used for the "coord" step. Do you have those? I'm mostly interested in seeing whether you set a sample size parameter, allele frequency thresholds, etc.

Finally, you could try using --only-hm3 in the coord step.

Best, Bjarni

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/bvilhjal/ldpred/issues/55?email_source=notifications&email_token=AAZ6XKGYGCA42FA2OP7BBT3QXH6GZA5CNFSM4G2VKIVKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGDI75A#issuecomment-562466804, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ6XKEXH2AJRN566LK2KMLQXH6GZANCNFSM4G2VKIVA.