bulik / ldsc

LD Score Regression (LDSC)
GNU General Public License v3.0
628 stars 340 forks source link

Issue with munge_sumstats.py : KeyError: u"None of [Int64Index([-1, -1, -1, -1, -1, -1, -1, -1, -1, -1,\n #245

Open montenegrina opened 3 years ago

montenegrina commented 3 years ago

Hello,

I saw a few tickets reporting a similar issue when using:

./munge_sumstats.py \ --sumstats UKB.GWAS.txt \ --N 336974 \ --ignore BETA \ --out UKB_DR \ --merge-alleles w_hm3.snplist

I am getting the error bellow. Tickets mention downgrading the version of pandas. When I run the above command without: --merge-alleles w_hm3.snplist I don't get any error.

My question is what is the purpose of --merge-alleles w_hm3.snplist?

Is the purpose of that flag to extract from my UKB_DR only SNPs present in w_hm3.snplist and to have for A1 and A2 values present in w_hm3.snplist? If yes I can do that in R without messing with pandas version.

Thanks Ana

My Error:

Call: ./munge_sumstats.py \ --out UKB_DR \ --merge-alleles w_hm3.snplist \ --N 336974.0 \ --sumstats UKB.GWAS.txt \ --ignore BETA

Interpreting column names as follows: A1: Allele 1, interpreted as ref allele for signed sumstat. P: p-Value Z: Z-score (0 --> no effect; above 0 --> A1 is trait/risk increasing) A2: Allele 2, interpreted as non-ref allele for signed sumstat. SNP: Variant ID (e.g., rs number)

Reading list of SNPs for allele merge from w_hm3.snplist Read 1217311 SNPs for allele merge. Reading sumstats from UKB.GWAS.txt into memory 5000000 SNPs at a time. . done Read 3859763 SNPs from --sumstats file. Removed 3329298 SNPs not in --merge-alleles. Removed 0 SNPs with missing values. Removed 0 SNPs with INFO <= 0.9. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with out-of-bounds p-values. Removed 105 variants that were not SNPs or were strand-ambiguous. 530360 SNPs remain. Removed 30 SNPs with duplicated rs numbers (530330 SNPs remain). Using N = 336974.0 Median value of Z was -0.00381962, which seems sensible. Removed 114 SNPs whose alleles did not match --merge-alleles (530216 SNPs remain).

ERROR converting summary statistics:

Traceback (most recent call last): File "./munge_sumstats.py", line 707, in munge_sumstats dat = allele_merge(dat, merge_alleles, log) File "./munge_sumstats.py", line 445, in allele_merge dat.loc[~jj, [i for i in dat.columns if i != 'SNP']] = float('nan') File "/home/anamaria/.conda/envs/ldsc/lib/python2.7/site-packages/pandas/core/indexing.py", line 189, in setitem indexer = self._get_setitem_indexer(key) File "/home/anamaria/.conda/envs/ldsc/lib/python2.7/site-packages/pandas/core/indexing.py", line 167, in _get_setitem_indexer return self._convert_tuple(key, is_setter=True) File "/home/anamaria/.conda/envs/ldsc/lib/python2.7/site-packages/pandas/core/indexing.py", line 248, in _convert_tuple idx = self._convert_to_indexer(k, axis=i, is_setter=is_setter) File "/home/anamaria/.conda/envs/ldsc/lib/python2.7/site-packages/pandas/core/indexing.py", line 1354, in _convert_to_indexer return self._get_listlike_indexer(obj, axis, **kwargs)[1] File "/home/anamaria/.conda/envs/ldsc/lib/python2.7/site-packages/pandas/core/indexing.py", line 1161, in _get_listlike_indexer raise_missing=raise_missing) File "/home/anamaria/.conda/envs/ldsc/lib/python2.7/site-packages/pandas/core/indexing.py", line 1246, in _validate_read_indexer key=key, axis=self.obj._get_axis_name(axis))) KeyError: u"None of [Int64Index([-1, -1, -1, -1, -1, -1, -1, -1, -1, -1,\n ...\n -1, -1, -1, -1, -1, -1, -1, -1, -1, -1], dtype='int64', length=1217311)] are in the [index]"

Conversion finished at Thu Nov 19 13:11:12 2020 Total time elapsed: 18.13s Traceback (most recent call last): File "./munge_sumstats.py", line 746, in munge_sumstats(parser.parse_args(), p=True) File "./munge_sumstats.py", line 707, in munge_sumstats dat = allele_merge(dat, merge_alleles, log) File "./munge_sumstats.py", line 445, in allele_merge dat.loc[~jj, [i for i in dat.columns if i != 'SNP']] = float('nan') File "/home/anamaria/.conda/envs/ldsc/lib/python2.7/site-packages/pandas/core/indexing.py", line 189, in setitem indexer = self._get_setitem_indexer(key) File "/home/anamaria/.conda/envs/ldsc/lib/python2.7/site-packages/pandas/core/indexing.py", line 167, in _get_setitem_indexer return self._convert_tuple(key, is_setter=True) File "/home/anamaria/.conda/envs/ldsc/lib/python2.7/site-packages/pandas/core/indexing.py", line 248, in _convert_tuple idx = self._convert_to_indexer(k, axis=i, is_setter=is_setter) File "/home/anamaria/.conda/envs/ldsc/lib/python2.7/site-packages/pandas/core/indexing.py", line 1354, in _convert_to_indexer return self._get_listlike_indexer(obj, axis, **kwargs)[1] File "/home/anamaria/.conda/envs/ldsc/lib/python2.7/site-packages/pandas/core/indexing.py", line 1161, in _get_listlike_indexer raise_missing=raise_missing) File "/home/anamaria/.conda/envs/ldsc/lib/python2.7/site-packages/pandas/core/indexing.py", line 1246, in _validate_read_indexer key=key, axis=self.obj._get_axis_name(axis))) KeyError: u"None of [Int64Index([-1, -1, -1, -1, -1, -1, -1, -1, -1, -1,\n ...\n -1, -1, -1, -1, -1, -1, -1, -1, -1, -1], dtype='int64', length=1217311)] are in the [index]"

Arushiii commented 3 years ago

I am also getting the same error.

xcliu-oc commented 1 year ago

I'm getting the same error, wondering is there a solution after 2 years of posting?