Al-Murphy / MungeSumstats

Rapid standardisation and quality control of GWAS or QTL summary statistics
https://doi.org/doi:10.18129/B9.bioc.MungeSumstats
75 stars 16 forks source link

ValueError: Improperly formatted sumstats file: #187

Closed AndreaG5 closed 3 months ago

AndreaG5 commented 3 months ago

1. Bug description

Hi,

thank you for the tool. I am currently struggling with usage of ldsc.py with sumstats derived from your R library. I took sumstats from GWAS catalog, and treat one sumstats as it was and another adding a column using awk. I am able to run format_sumstats() just specifying genome build and output directories - I got no error/warning. When I try using sumstats outputted by the fuction directly to ldsc.py --h2 I got error in the title:

Console output

Total time elapsed: 0.01s
Traceback (most recent call last):
  File "ldsc/ldsc.py", line 644, in <module>
    sumstats.estimate_h2(args, log)
  File "./ldsc/ldscore/sumstats.py", line 326, in estimate_h2
    args, log, args.h2)
  File "./ldsc/ldscore/sumstats.py", line 242, in _read_ld_sumstats
    sumstats = _read_sumstats(args, log, fh, alleles=alleles, dropna=dropna)
  File "./ldsc/ldscore/sumstats.py", line 163, in _read_sumstats
    sumstats = ps.sumstats(fh, alleles=alleles, dropna=dropna)
  File "./ldsc/ldscore/parse.py", line 91, in sumstats
    raise ValueError('Improperly formatted sumstats file: ' + str(e.args))
ValueError: Improperly formatted sumstats file: ('Usecols do not match names.',)

Honestly I don't know where the error is. File is tab sep, colnames matches input of LDSC. I tried keeping just one signed_stats column but did not work. I saw from the wiki that the ouptput can be used directly into ldsc function but I am not able to.

Do you have any suggestion about it? is it a known problem? (I'll be able to provide scripts, but not at the moment).

Thank you so much!

Al-Murphy commented 3 months ago

Hi, happy to help but I need more information to do so - can you let me know the version of mungesumstats you are using and the code you ran (and the console log output). There is a parameter to set to ensure the output is a valid format for ldsc (save_format='LDSC'), did you use this?

Thanks, Alan.

AndreaG5 commented 3 months ago

Oh I am so sorry, I was sure I put it in the argument list. My bad. Everything worked fine, thank you so much, sorry for the mistake!