bvilhjal / ldpred

MIT License
95 stars 58 forks source link

error in gibbs #70

Closed hershwin closed 4 years ago

hershwin commented 5 years ago

Hi Bjarni,

Thank you for updating ldpred. I am having some trouble with step 2. I have successfully coordinated the data and am now trying to generated LDpred SNP weights. Regardless of the LDradius I chose, it says that 0 SNP effects were found, and I get a final error that says: "AssertionError: Something is wrong with the GWAS summary statistics, parsing of them, or the given GWAS sample size (N). Lambda (the mean Chi-square statistic) is too small."

Do you know what could be causing this error? Thanks so much for your help.

Yixuan

Ambrosinae commented 5 years ago

I'm not sure if this helps, but I've had this error before with the Lambda being too small although I haven't had 0 SNP effects before. LDpred seems to assert lambda being > 1 before continuing, so if you have 0 SNP effects I don't think it passes the assert. Maybe there's something wrong with the format of your GWAS Summary Statistic file?

chaggerty commented 5 years ago

@Ambrosinae What was the solution in your case? I just encountered this error for a single chromosome (8) with 350k SNPs found. I'm using the same GWAS summary stats file for all chromosomes and no issues for any of the others.. so it's a bit baffling

chaggerty commented 5 years ago

Ok, so in my case, because I'm running the analysis separately for each chromosome, that assertion is tricky since it is meant to be the mean lambda across chromosomes, so I've chosen to ignore it in this instance.

Ambrosinae commented 5 years ago

@chaggerty Yeah, I ended up just ignoring it for now by editing the code to accept lower lambas.

bnj50 commented 5 years ago

I have the same issue...can you please elaborate how to modify the lambda in the command line...I don't see this option in LDpred1.py coord (or gibbs) helps... thanks

Ambrosinae commented 5 years ago

Sorry if I wasn't clear, I didn't use the command line, I just edited a local copy of the code in the /ldpred/ld.py file on line 327.

bnj50 commented 5 years ago

thanks...I guess you mean this line below ...so what did you change here

assert chi_square_lambda>1, 'Something is wrong with the GWAS summary statistics, parsing of them, or the given GWAS sample size (N). Lambda (the mean Chi-square statistic) is too small.  '
Ambrosinae commented 5 years ago

I was getting about ~0.996 so I just set it to:

assert chi_square_lambda>0.9, 'Something is wrong with the GWAS summary statistics, parsing of them, or the given GWAS sample size (N). Lambda (the mean Chi-square statistic) is too small. '

bnj50 commented 5 years ago

thanks...i don't know if the author plan to fix the bug but i may follow your hint!

Ambrosinae commented 5 years ago

No problem!

bnj50 commented 5 years ago

i made those changes but it doesnt make any difference in my case...so i hope this bug will be fixed soon ...look like the case still open

marielohcs commented 5 years ago

Hmm, not sure if it is related, but I am getting the opposite! My lambda appears to be too big... When I run the LDpred step (after supposedly successfully coordinating), I get "Genome-wide lambda inflation: inf" and my LDpred step stops running.

Any idea what might be going on, or what I should try? Thanks!

Sabor117 commented 5 years ago

Has there been any solution to this issue? I appear to be having a similar problem when running the Gibbs sampler:

AssertionError: The posterior mean is not a real number? Possibly due to problems with summary stats, LD estimates, or parameter settings.

I too have been running LDpred with the input summary stats split by chromosome in order to reduce the computational power required for it, so am wondering if this may be part of the issue (although I had assumed this would be fixed if I included the heritability of the SNPs with the --h argument, but perhaps I am misunderstanding this).

I also am wondering if it is anything to do with my input values as I have been working with a dataset with a huge sample size (250000) and wonder if this is throwing off the script in some way.

chaggerty commented 5 years ago

@Sabor117 check your coordination step to ensure that you are using the --beta flag if your summary stats have beta values (and not OR). I ran into that problem and realized I was inputting betas and they were being handled as OR (hence the real number issues).

chaggerty commented 5 years ago

@namjoub2 what lambda value are you getting? I just commented out that assertion line in my code rather than modifying the threshold (because I knew it was just barely < 1) but I don't know what the implications are if it's <<1

bnj50 commented 5 years ago

@chaggert it is above 1

bash-4.1$ LDpred1.py gibbs --cf test-ld-predB --ldr 2000 --N 9677 --out gibbs --ldf LDF

Calculating LD information w. radius 2000 Storing LD information to compressed pickle file Applying LDpred with LD radius: 2000 1158826 SNP effects were found Traceback (most recent call last): File "/usr/local/ldpred/1.0.0/LDpred1.py", line 300, in main() File "/usr/local/ldpred/1.0.0/LDpred1.py", line 287, in main LDpred_gibbs.main(p_dict) File "/usr/local/ldpred/1.0.0/ldpred1/LDpred_gibbs.py", line 366, in main h2=p_dict['h2'], verbose=p_dict['debug'], summary_dict=summary_dict) File "/usr/local/ldpred/1.0.0/ldpred1/LDpred_gibbs.py", line 186, in ldpred_genomewide herit_dict = ld.get_chromosome_herits(cord_data_g, ld_scores_dict, n, h2=h2, debug=verbose,summary_dict=summary_dict) File "/usr/local/ldpred/1.0.0/ldpred1/ld.py", line 327, in get_chromosome_herits assert chi_square_lambda>1, 'Something is wrong with the GWAS summary statistics, parsing of them, or the given GWAS sample size (N). Lambda (the mean Chi-square statistic) is too small. ' AssertionError: Something is wrong with the GWAS summary statistics, parsing of them, or the given GWAS sample size (N). Lambda (the mean Chi-square statistic) is too small.

bvilhjal commented 5 years ago

Hi, I apologize for the slow reply.

The aim of calculating the lambda Chi-square statistic inflation factor is to QC the summary statistics. Basically, it checks whether the effects are on the right scale. If not, then it suggests that there is something wrong with the summary statistics, parsing of the summary stats, or perhaps the N value supplied.

I will keep this issue open with the aim of implementing something that provides more information that can help identifying the problem.

Sabor117 commented 5 years ago

@Sabor117 check your coordination step to ensure that you are using the --beta flag if your summary stats have beta values (and not OR). I ran into that problem and realized I was inputting betas and they were being handled as OR (hence the real number issues).

This actually was the issue I think! I hadn't noticed this particular flag! Thank you!

hrafnfaedhir commented 4 years ago

Hello, I am just starting to use the software and I am getting a similar error. We are hoping to use LDpred to try and calculate risk scores for chrMT. I know this chromosome is ignored in the coord step, but we've put them on an autosomal chromosome. (At this point, we are exploring your software, but are chrMT and Y excluded because they haven't been explored or because they require specific coding alterations for haploid genotypes?) We also used PCs to control the effects of population stratification when we calculated our Summary Statistics, using a logistic regression model, and this produced a Genomic Inflation Factor (lambda) of 0.8138 and we get the error:

AssertionError: Something is wrong with the GWAS summary statistics, parsing of them, or the given GWAS sample size (N). Lambda (the mean Chi-square statistic) is too small.

When we run the summary statistics without the popStrat PCs, we get a lambda of 0.9769 which still errors out. I can understand adjusting the statistics for a GIF over 1, but is it going to be problematic to have lambda less than one? How about a lambda much less than 1? Do you recommend the inclusion of population stratification PCs and other covariates during the calculation of summary statistics? I also suspect that the complication may be due to having a small number of markers ~360.

hrafnfaedhir commented 4 years ago

I was able to rectify this error by dividing the BETA values by the inflation factor and then recalculating the pvalues. This brought lambda over 1 and allowed the gibbs step to complete.

bvilhjal commented 4 years ago

Thanks for your question. Fortunately it seems that you found a reasonable solution. Generally I would worry about applying to chrMT, as it supposedly has no (normal) recombination. Hence LD patterns could be weird.

hrafnfaedhir commented 4 years ago

Thank you for your quick reply. We understand that chrMT has no recombination, and therefore cannot be in linkage equilibrium. We hoped that LDpred would be able to help us take the inherent LD into consideration. What are the assumptions that LDpred makes about LD that might be problematic if applied to chrMT? Does LDpred make use of the PLINK files' recombination map during its calculations? Also, we ran into problems during the ldpred score step when trying to include a covariate file. I know the Q&A page makes mention of importing the results into R to estimate the total variance accounted for by the PRS and covariates. Does LDpred take into account the covariates during the calculation of the PRS or is only used when calculating the R2?

Thank you for your time.

bvilhjal commented 4 years ago

LDpred does not (currently) rely on a recombination map, so that shouldn't be a problem. LDpred does not take covariates into account when calculating polygenic scores. Also, the covariate option in the score is buggy, and I generally do not recommend using it (for now).

Best, Bjarni

hrafnfaedhir commented 4 years ago

What impact would having a lambda (GIF) less than one have on the calculations? Is it necessary to assert that lambda must be greater than 1? Above you stated that the calculation of lambda is to QC the summary statistics.

natashapm commented 4 years ago

I am having the same issue as others in this thread regarding the error: "Something is wrong with the GWAS summary statistics, parsing of them, or the given GWAS sample size (N). Lambda (the mean Chi-square statistic) is too small."

It seems as though the solutions proposed above include the following: 1: use the --beta flag --> This seems to be out of date. 2: edit the LDpred code to accept lower lambdas

Can you please elaborate on what accepting lower lambdas would mean, as in hrafnfaedhir's comment? Are there any other options? Thank you!

bbitarello commented 3 years ago

I am having a related problem @natashapm . This flag they mention doesn't exist anymore but I use --eff_type LINREG and what is happening is that I am getting high lambda values (>3) even though the summary statistics should be corrected already...

bbitarello commented 3 years ago

to clarify, the GC lambda value ldpred gives me is much higher (~ 2-fold) than what I get if I calculate it manually in R.

bnj50 commented 3 years ago

They wont answer at all for last two years!…try other program such as PRS-cs b.

From: Bárbara Bitarello notifications@github.com Sent: Tuesday, December 1, 2020 5:33 PM To: bvilhjal/ldpred ldpred@noreply.github.com Cc: Namjou-Khales, Bahram Bahram.Namjou@cchmc.org; Mention mention@noreply.github.com Subject: Re: [bvilhjal/ldpred] error in gibbs (#70)

to clarify, the GC lambda value ldpred gives me is much higher than what I get if I calculate it manually in R.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/bvilhjal/ldpred/issues/70#issuecomment-736862637, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AL3MQWL3HPCIHRJU5OSTLJ3SSVVK7ANCNFSM4G7VAB6Q.