Open yuupei opened 12 months ago
Hi there Yuupei,
Did you ever solve this particular issue with LDSC? I have been encountering it myself now and am a bit stuck with troubleshooting.
Googling this particular error provided a few different results, but none of them seemed to be relevant to my own issue, so I would also appreciate some help with this.
Here is my log file:
*********************************************************************
* LD Score Regression (LDSC)
* Version 1.0.1
* (C) 2014-2019 Brendan Bulik-Sullivan and Hilary Finucane
* Broad Institute of MIT and Harvard / MIT Department of Mathematics
* GNU General Public License v3
*********************************************************************
Call:
./munge_sumstats.py \
--out /scratch/project_2007428/projects/prj_001_cost_gwas/processing/ldsc_intermediate_files//UKB_ALL_ALL_ldsc_input_ALL_ALL_ldsc_munged \
--merge-alleles /scratch/project_2007428/users/Zhiyu/Tool/ldsc/Ref/w_hm3.snplist \
--sumstats /scratch/project_2007428/projects/prj_001_cost_gwas/processing/ldsc_intermediate_files/UKB_ALL_ALL_ldsc_input.txt.gz
ERROR converting summary statistics:
Traceback (most recent call last):
File "/projappl/project_2007428/software/ldsc/munge_sumstats.py", line 611, in munge_sumstats
'Could not find a signed summary statistic column.')
ValueError: Could not find a signed summary statistic column.
Conversion finished at Tue Jan 16 18:39:59 2024
Total time elapsed: 0.0s
And here is the header of my input file (in R):
> head(sumstats)
rsid a1 a0 n p beta1
rs687513 A G 212765 0.3625225 -0.00456243
rs6577165 T A 212765 0.3173474 0.41311700
rs7529831 A C 212765 0.8126003 -0.02278910
rs6577221 T C 212765 0.6074954 0.04990290
rs12733701 G A 212765 0.6059015 -0.01776350
rs17124137 A C 212765 0.1950065 0.13357700
> summary(sumstats)
rsid a1 a0 n
Length:1230617 Length:1230617 Length:1230617 Min. :212765
Class :character Class :character Class :character 1st Qu.:212765
Mode :character Mode :character Mode :character Median :212765
Mean :212765
3rd Qu.:212765
Max. :212765
p beta1
Min. :0.0000 Min. :-1.8848700
1st Qu.:0.2041 1st Qu.:-0.0036829
Median :0.4561 Median : 0.0000183
Mean :0.4681 Mean : 0.0000265
3rd Qu.:0.7249 3rd Qu.: 0.0037395
Max. :1.0000 Max. : 1.4071600
As far as I can make out, other examples of this error mention that the file needs to be white-space delimited (mine is tab delimited), it might be to do with parsing the arguments incorrectly (but it seems like LDSC is working correctly), the presence of NAs or NaN in the file (but as you can see there aren't), and a potential mis-match between rsID and data (but I don't see how that could be happening either).
Essentially, as far as I'm aware my input matches the requirements for munge_sumstats, but it's still not quite working. Any help would be appreciated!
Hello,
The mungesumstats.py file uses several lists of different names for column headers that are commonly used in GWAS summary statistics files. Lines 85-96 in the script shows what headers are considered acceptable for the effect size (BETA or Odds-Ratio) of your GWAS.
# SIGNED STATISTICS
'ZSCORE': 'Z',
'Z-SCORE': 'Z',
'GC_ZSCORE': 'Z',
'Z': 'Z',
'OR': 'OR',
'B': 'BETA',
'BETA': 'BETA',
'LOG_ODDS': 'LOG_ODDS',
'EFFECTS': 'BETA',
'EFFECT': 'BETA',
'SIGNED_SUMSTAT': 'SIGNED_SUMSTAT',
If your header doesn't match, you need to specify the name of your effect size column using the argument --signed-sumstats. There are additional arguments similar to this for specifying the names of other columns, such as --snp for specifying your RSID column. For example, in @Sabor117 's case, they would need to use the argument like this: --signed-sumstats beta1
Hi there!
Thanks for getting back to this question, I can confirm that the issue my end was that I had not correctly specified the --signed-sumstats
column (it actually was not initially clear to me that this was meant to mean the "effect size" of the given allele).
Since making my post last week, I adjusted the summary stats and my LDSC code and it worked with the following:
./munge_sumstats.py \
--signed-sumstats zscore1,0 \
--out /scratch/project_2007428/projects/prj_001_cost_gwas/processing/ldsc_intermediate_files//UKB_ALL_ALL_ldsc_munged \
--merge-alleles /scratch/project_2007428/users/Zhiyu/Tool/ldsc/Ref/w_hm3.snplist \
--a1-inc \
--N-col n \
--a1 a1 \
--a2 a0 \
--snp rsid \
--sumstats /scratch/project_2007428/projects/prj_001_cost_gwas/processing/ldsc_intermediate_files/UKB_ALL_ALL_ldsc_input.txt.gz \
--p p
Note, I changed my effect sizes from betas into Z-scores (beta1
became zscore1
) as the documentation for LDSC seemed to suggest that it preferred using Z-scores or ORs to betas. I also included the --a1-inc
flag as my Z-scores were always related to the A1 (which I hope is the correct usage).
Thanks again for the response here!
Hi, I am new here.
when i tried to run my data I encounter this issue everytime. ERROR converting summary statistics:
Traceback (most recent call last): File "./munge_sumstats.py", line 611, in munge_sumstats 'Could not find a signed summary statistic column.') ValueError: Could not find a signed summary statistic column.
Conversion finished at Thu Sep 28 07:30:17 2023 Total time elapsed: 0.0s Traceback (most recent call last): File "./munge_sumstats.py", line 745, in
munge_sumstats(parser.parse_args(), p=True)
File "./munge_sumstats.py", line 611, in munge_sumstats
'Could not find a signed summary statistic column.')
ValueError: Could not find a signed summary statistic column.
Can I know what does this mean and what should I do?
Thank you