Closed wavefancy closed 5 years ago
Hi Wallace, I'm worried that the reason may lie in the fact that LDpred relies on a model, for the gibbs sampler, and the gibbs sampler may become unstable with large sample sizes if model assumptions are wrong. Basically, it's less robust to model assumption violations when sample sizes are large.
Are you using the latest version for your analysis?
Best, Bjarni
Hi Bjarni,
Thanks much for your quick reply.
Yes, I noticed there's a few warning about the not convergence of gibbs sampler for a few chromosomes. However, when I removed those chrs, there's still no power for small pho.
Sorry for I used the old pip version where I can make it run chr by chr. Does the newer version has improved on the gibbs sampling stability? And also, do you have any suggestion what I should do for a few chrs the gibbs sampler is not convergence, does this will affect prediction power? By the way, does the newer version support chr by chr run?
Thanks so much for your great support.
Best regards Wallace
Hi Bjarni,
For case/control study, if the number of cases and controls are in imbalance. Should I use the effective number of samples or just the total of cases and controls?
Best regards Wallace
Hi Wallace,
I'm not sure, it sounds like a good idea to use the effective sample size. I suggest you try both.
Regarding the general issue of training on summary stats from very large samples. I plan to improve on that in the coming months, I'll keep you updated.
Best, Bjarni
Hi Bjarni,
Just want to follow up any updates on the large N problem?
Best regards Wallace
Hi, I am also having a similar problem, my N is 200000 - @wavefancy were you able to find a solution? Thanks for your help!
@hershwin , Sorry, I don't have.
I think I've now improved convergence in LDpred when applied to large samples. Another possible improvement can be obtained my excluding regions of long range LD (to come in version 1.10). I'll close this issue now.
Thanks, Bjarni
This is so great, thank you so muc!
Best regards Wallace
On Thu, Oct 17, 2019 at 5:24 AM Bjarni J. Vilhjalmsson < notifications@github.com> wrote:
I think I've now improved convergence in LDpred when applied to large samples. Another possible improvement can be obtained my excluding regions of long range LD (to come in version 1.10). I'll close this issue now.
Thanks, Bjarni
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bvilhjal/ldpred/issues/55?email_source=notifications&email_token=AALGO4SCNUWNJ542UR3JNRTQPA4MRA5CNFSM4G2VKIVKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBPTHUI#issuecomment-543110097, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALGO4RMVF2ERHHW23EKOFLQPA4MRANCNFSM4G2VKIVA .
-- Best regards Wallace(Minxian) Wang
Computational Biologist, Khera Lab. Broad Institute of MIT and Harvard 415 Main St, Cambridge, MA 02142
Hi Bjarni,
Jumping on this issue rather than opening new issue. I'm working with the most recent version of LDPred, and am encountering convergence issues with large N (N = 772555). I'm having difficulty interpreting the underlying issue with the LDPred model in regards to large sample size - could you help explaining this?
Hi Brad, could you please provide more details. What models fail? How large is the heritability? Perhaps even the parameters you used to run coord and gibbs.
Thanks for the reply. I’m running LDPred v1.0.11. From the Gibbs log: LD radius: 364 Genome-wide estimated heritability: 0.1150 Chi-square lambda: 2.5774 I achieve convergence for causal fractions: f=1, f=.3, fail all others.
On Dec 5, 2019, at 10:26, Bjarni J. Vilhjalmsson notifications@github.com wrote:
Hi Brad, could you please provide more details. What models fail? How large is the heritability? Perhaps even the parameters you used to run LDpred.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/bvilhjal/ldpred/issues/55?email_source=notifications&email_token=AAZ6XKAQVK2HVVLUR5LOCQDQXEMR5A5CNFSM4G2VKIVKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGBCRYY#issuecomment-562178275, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ6XKC3FY2MLM3MSDCXFN3QXEMR5ANCNFSM4G2VKIVA.
Hi Brad,
Thanks a lot for this information, it's really useful for trying to understand how I can improve LDpred. Did you set a sample size parameter, --N (which I wouldn't generally recommend)?
Also, what about the parameters you used for the "coord" step. Do you have those? I'm mostly interested in seeing whether you set a sample size parameter, allele frequency thresholds, etc.
Finally, you could try using --only-hm3 in the coord step.
Best, Bjarni
Hi Bjarni,
These are the parameters I’m invoking for the coord step: chr, pos, A1/A2, reffreq, eff, se, pval, rs, ncol, eff_type
I tested the recommendation of removing —N parameter from Gibbs step, but still have same convergence issues.
Thanks, Brad
On Dec 6, 2019, at 02:36, Bjarni J. Vilhjalmsson notifications@github.com wrote:
Hi Brad,
Thanks a lot for this information, it's really useful for trying to understand how I can improve LDpred. Did you set a sample size parameter, --N (which I wouldn't generally recommend)?
Also, what about the parameters you used for the "coord" step. Do you have those? I'm mostly interested in seeing whether you set a sample size parameter, allele frequency thresholds, etc.
Finally, you could try using --only-hm3 in the coord step.
Best, Bjarni
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/bvilhjal/ldpred/issues/55?email_source=notifications&email_token=AAZ6XKGYGCA42FA2OP7BBT3QXH6GZA5CNFSM4G2VKIVKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGDI75A#issuecomment-562466804, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ6XKEXH2AJRN566LK2KMLQXH6GZANCNFSM4G2VKIVA.
Hi Bjarni J. Vilhja´lmsson,
Thanks much for your support of LDpred.
Can you please help us on one thing that is very hard for us to understand: LDpred works well for GWAS summary estimated from a small sample size (185000) across all pho values. However, when we meta-analysis with new samples, the sample size is 1140845 now. The LDpred only works for large pho, it has no power for small pho. Does the power should always improve as the increasing of sample size? Why does power decrease for small phos?
Below is the R2 for the comparison, large sample size 1140845, not work for small phos.
Small sample size, 185000, works well for small phos:
Thanks much for your support. Best regards Wallace