Open SylviaXJY opened 1 year ago
Hello, thanks for reporting this issue. The error indicates that there is an allele flip in your input data. If this is not the case, there may be a bug in the code.
Could you please provide me with a replicable dataset? I'll check it out with code.
Best Regards, Cue
Hello, SylviaXJY
I found what caused the error.
:: "ldsc_preprocess.py" does not allow allele mismatch, all risk(A1) and reference(A2) alleles from all .sumstats should match.
In the exemplary dataset, I found several allele mismatches. One of them is SNP 'rs12478753': ms: rs12478753 G(A1) A(A2) -1.762 115803.000 TID: rs12478753 A(A1) G(A2) 447388 0.752
Please do the following QC for your summary statistics before you analyze them using PLEIO. I will elaborate on the details of the QC process for PLEID analysis in WIKI later:
P.S. During the investigation, I found a minor bug and fixed it.
Best Regards, Cue
I also found a difference between the log file I generated using the data and what you generated. Currently, I don't know what causes this difference. Let me know if you have any other questions related with this.
Below is the .log I got from the ldsc_preprocess:
Call: ./ldsc_preprocess.py \ --ref-ld-chr eur_w_ld_chr/ \ --out output \ --input input.txt \ --w-ld-chr eur_w_ld_chr/
Beginning analysis at Sat Nov 12 16:24:56 2022
Read 2 traits from input
Failed to create a directory at : output
Failed to create a directory at : output/temp
Dividing z-scores with the correction factor (the squared root of the LDSC h2 analysis intercept value).
Dividing z-scores with the correction factor (the squared root of the LDSC h2 analysis intercept value).
Generate input files (sumstats.txt.gz) for LDSC --rg analysis
Generated output/temp/transient_ischemic_attack.sumstats.tsv.sumstats.gz
Generated output/temp/ms.sumstats.tsv.sumstats.gz
Number of variants in common: 1051555
Found 0 duplicated variants
Traceback (most recent call last):
File "./ldsc_preprocess.py", line 482, in
Analysis finished at Sat Nov 12 16:26:27 2022 Total time elapsed: 1.0m:31.29s
Thank you for your prompt response! And Thank you for the solution, I will try it!
Best, Xiongjy
When I call ./ldsc_preprocess.py -h it runs and I successfully get the "sg.txt.gz" and "ce.txt.gz". However when I attempt to check the log file I get the following error:
ValueError: Found Allele mismatch: ['rs7299872' 'rs7299873' 'rs7299874' ... 'rs10943760' 'rs7254116' 'rs11954743']
And I am sure I have already adjusted my VCF files to match the reference. What am I missing here? Thanks a lot!