Cloufield / gwaslab

A Python package for handling and visualizing GWAS summary statistics. https://cloufield.github.io/gwaslab/
GNU General Public License v3.0
118 stars 22 forks source link

error while using rsid_to_chrpos #62

Open soumickmj opened 8 months ago

soumickmj commented 8 months ago

While using rsid_to_chrpos, I'm getting the following error:

File "/center/genomics/soumick/fede/manipulate_sumstats/manipulate_sumstats.py", line 100, in manipulator.assign_chpos() File "/center/genomics/soumick/fede/manipulate_sumstats/manipulate_sumstats.py", line 43, in assign_chpos self.sumstats.rsid_to_chrpos(path = self.ref_rsid_tsv19 if self.gnome_build == "19" else self.ref_rsid_tsv38)
File "/scratch/soumick.chatterjee/conda_envs/BeegFSTorchHTBeta2/lib/python3.10/site-packages/gwaslab/Sumstats.py", line 370, in rsid_to_chrpos self.data = rsidtochrpos(self.data,log=self.log,args) File "/scratch/soumick.chatterjee/conda_envs/BeegFSTorchHTBeta2/lib/python3.10/site-packages/gwaslab/retrievedata.py", line 68, in rsidtochrpos sumstats.update(dic,overwrite="True") File "/scratch/soumick.chatterjee/conda_envs/BeegFSTorchHTBeta2/lib/python3.10/site-packages/pandas/core/frame.py", line 7576, in update other = other.reindex_like(self) File "/scratch/soumick.chatterjee/conda_envs/BeegFSTorchHTBeta2/lib/python3.10/site-packages/pandas/core/generic.py", line 4236, in reindex_like return self.reindex(d) File "/scratch/soumick.chatterjee/conda_envs/BeegFSTorchHTBeta2/lib/python3.10/site-packages/pandas/util/_decorators.py", line 324, in wrapper return func(*args, kwargs) File "/scratch/soumick.chatterjee/conda_envs/BeegFSTorchHTBeta2/lib/python3.10/site-packages/pandas/core/frame.py", line 4807, in reindex return super().reindex(kwargs) File "/scratch/soumick.chatterjee/conda_envs/BeegFSTorchHTBeta2/lib/python3.10/site-packages/pandas/core/generic.py", line 4966, in reindex return self._reindex_axes( File "/scratch/soumick.chatterjee/conda_envs/BeegFSTorchHTBeta2/lib/python3.10/site-packages/pandas/core/frame.py", line 4626, in _reindex_axes frame = frame._reindex_index( File "/scratch/soumick.chatterjee/conda_envs/BeegFSTorchHTBeta2/lib/python3.10/site-packages/pandas/core/frame.py", line 4645, in _reindex_index return self._reindex_with_indexers( File "/scratch/soumick.chatterjee/conda_envs/BeegFSTorchHTBeta2/lib/python3.10/site-packages/pandas/core/generic.py", line 5032, in _reindex_with_indexers new_data = new_data.reindex_indexer( File "/scratch/soumick.chatterjee/conda_envs/BeegFSTorchHTBeta2/lib/python3.10/site-packages/pandas/core/internals/managers.py", line 676, in reindex_indexer self.axes[axis]._validate_can_reindex(indexer) File "/scratch/soumick.chatterjee/conda_envs/BeegFSTorchHTBeta2/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 4121, in _validate_can_reindex raise ValueError("cannot reindex on an axis with duplicate labels") ValueError: cannot reindex on an axis with duplicate labels

Any idea what is causing this?

Thanks :)

Cloufield commented 8 months ago

Hi, This reason might be that there are duplicate labels in rsID in the reference file or your sumstats. I will fix this and let you know soon.

Cloufield commented 8 months ago

Hi, I am wondering which version of gwaslab you are using now? I tested in the latest version (v3.4.29) and there was no such error. It seems I have already fixed this issue https://github.com/Cloufield/gwaslab/commit/4f3b07e24f3d0fc2a869e0b47dc60b4f141270d7.