bulik / ldsc

LD Score Regression (LDSC)
GNU General Public License v3.0
612 stars 332 forks source link

Filtering of munge_sumstats.py #439

Open yesyj-yuns opened 1 week ago

yesyj-yuns commented 1 week ago

Hi,

I tried to convert my GWAS data into a sumstats file using munge_sumstats.py.

When you look at https://github.com/bulik/ldsc/wiki/Summary-Statistics-File-Format#sumstats , it says that it filters ambiguous SNPs or non-SNP variants. Could you please let me know why you filter ambiguous SNP when converting GWAS summary statistics to summary.sumstats?

And #375 You answered the question above, but I still don't understand it, so I'm asking you. My GWAS summary statistics (logistic regression data) were generated using plink2. In this data, the effect allele, A1, is an alternative allele, not a reference allele. In this case, can I know whether the A1 allele in the file to be entered in munge_sumstats.py should be a reference allele or an effect allele?

Thank you very much:)

aksarkar commented 1 week ago

@yesyj-yuns It is more difficult to detect errors when computing GWAS associations or meta-analyses for strand ambiguous variants. The simplest strategy to avoid introducing noise into ldsc from such errors is to remove ambiguous variants.

For your second question, ldsc flips alleles and effect sizes as necessary to ensure that A1 is ref, A2 is alt, and effect allele is alt. As long as A1 in the input is consistently ref or consistently alt, you will get the correct answer.

yesyj-yuns commented 1 week ago

Thank you so much for your response.

I'm really sorry, but I'm inquiring again because I don't understand the answer to the second question yet.

If you look at the descript_cname inside the munge_sumstats.py, the statistics are defined based on A1 as shown below.

  'A1': 'Allele 1, interpreted as ref allele for signed sumstat.',
  'A2': 'Allele 2, interpreted as non-ref allele for signed sumstat.',
  'Z': 'Z-score (0 --> no effect; above 0 --> A1 is trait/risk increasing)',
  'OR': 'Odds ratio (1 --> no effect; above 1 --> A1 is risk increasing)',
  'BETA': '[linear/logistic] regression coefficient (0 --> no effect; above 0 --> A1 is trait/risk increasing)',
  'LOG_ODDS': 'Log odds ratio (0 --> no effect; above 0 --> A1 is risk increasing)',

In my GWAS file, A1 is effect allele, but not ref allele. In this case, I would like to ask if I can put the A1 allele of Input GWAS as an effect allele.

If not, I would like to check whether A1 allele in the Input GWAS summary file should be set as a reference allele and the statistics (beta, OR) of my GWAS should also be multiplied by -1 according to the ref allele.

I would like to thank you once again. Best regards.

aksarkar commented 1 week ago

I have misspoken since I didn't read the source code correctly: you do need to make sure that A1 is ref and multiply beta by -1 accordingly.