bulik / ldsc

LD Score Regression (LDSC)
GNU General Public License v3.0
641 stars 342 forks source link

Munge Error - ValueError: could not convert string to float: OR #329

Open dzimmerman-amc opened 2 years ago

dzimmerman-amc commented 2 years ago

Hi everyone,

I am trying to munge some data for later use in ldsc and I come into this error:

/home/expcard/Projects/GWAS_SCA/GWAS_NTR/LDSC/ldsc/munge_sumstats.py \

--sumstats /home/dominicz/LDSC/SCAMILIFELINESforMETALnoSNPFinalLDSCmunge.txt \ --N 18236 \ --chunksize 500000 \ --out /home/dominicz/LDSC/SCAMILIFELINESforMETALnoSNPFinalLDSC.munge.txt \ --merge-alleles /home/dominicz/LDSC/w_hm3.snplist


  • LD Score Regression (LDSC)
  • Version 1.0.1
  • (C) 2014-2019 Brendan Bulik-Sullivan and Hilary Finucane
  • Broad Institute of MIT and Harvard / MIT Department of Mathematics
  • GNU General Public License v3

    Call: ./munge_sumstats.py \ --out /home/dominicz/LDSC/SCAMILIFELINESforMETALnoSNPFinalLDSC.munge.txt \ --merge-alleles /home/dominicz/LDSC/w_hm3.snplist \ --chunksize 500000 \ --N 18236.0 \ --sumstats /home/dominicz/LDSC/SCAMILIFELINESforMETALnoSNPFinalLDSCmunge.txt

Interpreting column names as follows: N: Sample size A1: Allele 1, interpreted as ref allele for signed sumstat. P: p-Value A2: Allele 2, interpreted as non-ref allele for signed sumstat. SNP: Variant ID (e.g., rs number) OR: Odds ratio (1 --> no effect; above 1 --> A1 is risk increasing)

Reading list of SNPs for allele merge from /home/dominicz/LDSC/w_hm3.snplist Read 1217311 SNPs for allele merge. Reading sumstats from /home/dominicz/LDSC/SCAMILIFELINESforMETALnoSNPFinalLDSCmunge.txt into memory 500000 SNPs at a time. . ERROR converting summary statistics:

Traceback (most recent call last): File "/home/expcard/Projects/GWAS_SCA/GWAS_NTR/LDSC/ldsc/munge_sumstats.py", line 686, in munge_sumstats dat = parse_dat(dat_gen, cname_translation, merge_alleles, log, args) File "/home/expcard/Projects/GWAS_SCA/GWAS_NTR/LDSC/ldsc/munge_sumstats.py", line 238, in parse_dat for block_num, dat in enumerate(dat_gen): File "/home/dominicz/.conda/envs/ldsc/lib/python2.7/site-packages/pandas/io/common.py", line 93, in BaseIterator.next = lambda self: self.next() File "/home/dominicz/.conda/envs/ldsc/lib/python2.7/site-packages/pandas/io/parsers.py", line 959, in next return self.get_chunk() File "/home/dominicz/.conda/envs/ldsc/lib/python2.7/site-packages/pandas/io/parsers.py", line 1019, in get_chunk return self.read(nrows=size) File "/home/dominicz/.conda/envs/ldsc/lib/python2.7/site-packages/pandas/io/parsers.py", line 982, in read ret = self._engine.read(nrows) File "/home/dominicz/.conda/envs/ldsc/lib/python2.7/site-packages/pandas/io/parsers.py", line 1719, in read data = self._reader.read(nrows) File "pandas/_libs/parsers.pyx", line 890, in pandas._libs.parsers.TextReader.read (pandas/_libs/parsers.c:10862) File "pandas/_libs/parsers.pyx", line 924, in pandas._libs.parsers.TextReader._read_low_memory (pandas/_libs/parsers.c:11343) File "pandas/_libs/parsers.pyx", line 989, in pandas._libs.parsers.TextReader._read_rows (pandas/_libs/parsers.c:12175) File "pandas/_libs/parsers.pyx", line 1117, in pandas._libs.parsers.TextReader._convert_column_data (pandas/_libs/parsers.c:14136) File "pandas/_libs/parsers.pyx", line 1190, in pandas._libs.parsers.TextReader._convert_tokens (pandas/_libs/parsers.c:15330) ValueError: could not convert string to float: OR

Conversion finished at Wed Nov 10 15:34:55 2021 Total time elapsed: 2.42s Traceback (most recent call last): File "/home/expcard/Projects/GWAS_SCA/GWAS_NTR/LDSC/ldsc/munge_sumstats.py", line 745, in munge_sumstats(parser.parse_args(), p=True) File "/home/expcard/Projects/GWAS_SCA/GWAS_NTR/LDSC/ldsc/munge_sumstats.py", line 686, in munge_sumstats dat = parse_dat(dat_gen, cname_translation, merge_alleles, log, args) File "/home/expcard/Projects/GWAS_SCA/GWAS_NTR/LDSC/ldsc/munge_sumstats.py", line 238, in parse_dat for block_num, dat in enumerate(dat_gen): File "/home/dominicz/.conda/envs/ldsc/lib/python2.7/site-packages/pandas/io/common.py", line 93, in BaseIterator.next = lambda self: self.next() File "/home/dominicz/.conda/envs/ldsc/lib/python2.7/site-packages/pandas/io/parsers.py", line 959, in next return self.get_chunk() File "/home/dominicz/.conda/envs/ldsc/lib/python2.7/site-packages/pandas/io/parsers.py", line 1019, in get_chunk return self.read(nrows=size) File "/home/dominicz/.conda/envs/ldsc/lib/python2.7/site-packages/pandas/io/parsers.py", line 982, in read ret = self._engine.read(nrows) File "/home/dominicz/.conda/envs/ldsc/lib/python2.7/site-packages/pandas/io/parsers.py", line 1719, in read data = self._reader.read(nrows) File "pandas/_libs/parsers.pyx", line 890, in pandas._libs.parsers.TextReader.read (pandas/_libs/parsers.c:10862) File "pandas/_libs/parsers.pyx", line 924, in pandas._libs.parsers.TextReader._read_low_memory (pandas/_libs/parsers.c:11343) File "pandas/_libs/parsers.pyx", line 989, in pandas._libs.parsers.TextReader._read_rows (pandas/_libs/parsers.c:12175) File "pandas/_libs/parsers.pyx", line 1117, in pandas._libs.parsers.TextReader._convert_column_data (pandas/_libs/parsers.c:14136) File "pandas/_libs/parsers.pyx", line 1190, in pandas._libs.parsers.TextReader._convert_tokens (pandas/_libs/parsers.c:15330) ValueError: could not convert string to float: OR

Would anyone know how to fix this? I thought it may be caused by empty values in the OR column but there aren't any.

Thanks in advance!

kdack commented 1 year ago

You probably fixed this long ago, but for anyone else who finds this from google (like I did), the problem is invalid values in numeric columns.

An easy way to find the problem column in R is to run "read.txt("filename", header=TRUE)", and check what data type was assigned to each column. If the "n", "beta", or"p-value" columns are character type, there is your problem - a non-numeric value has somehow slipped in.

nini-tech23 commented 1 year ago

@kdack, Hello, I have same problem here with p-value column. Even I saved the file again after changing them as numeric on R, I still got same error recognizing them as character. Can you share me how to solve it?

kdack commented 1 year ago

@kdack, Hello, I have same problem here with p-value column. Even I saved the file again after changing them as numeric on R, I still got same error recognizing them as character. Can you share me how to solve it?

Ensuring all columns were numeric worked for me, but I imagine this type of error will occur for any formatting problems. Spaces, NA values perhaps.

You could try making up some demo data and checking if it works, just to see that everything is working as intended. Then take random smaller samples of your data and check if the error occurs on all of them. That would help you narrow down exactly which lines of data are causing the issue.