Cloufield / gwaslab

A Python package for handling and visualizing GWAS summary statistics. https://cloufield.github.io/gwaslab/
GNU General Public License v3.0
118 stars 22 forks source link

Error importing table to gl-format #60

Open swvanderlaan opened 8 months ago

swvanderlaan commented 8 months ago

I have some data in a table like below.

Ancestry | Sex | rsID | CHR | POS | Allelles | EAF | OR | OR_95LOWER | OR_95UPPER | P | N | LOCATION | GENE | EnsemblID | SNPID | EA | NEA | BETA | SE | variant_id | tss_distance | af | ma_samples | ma_count | pval_g | b_g | b_g_se | pval_i | b_i | b_i_se | pval_gi | b_gi | b_gi_se | chr | start | end | strand | name | type | vch | vbp | REF | ALT | __index_level_0__
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
str | str | str | i64 | i64 | str | f64 | f64 | f64 | f64 | f64 | i64 | i64 | str | str | str | str | str | f64 | f64 | str | i32 | f32 | i32 | i32 | f64 | f32 | f32 | f64 | f32 | f32 | f64 | f32 | f32 | i64 | i64 | i64 | i64 | str | str | i64 | i64 | str | str | i64
"AFR" | "Men" | "rs77969349" | 5 | 75980679 | "T/A" | 0.057 | 2.6 | 1.86 | 3.61 | 0.0 | 1252 | 0 | "IQGAP2" | "ENSG0000014570… | "5:75980679" | "A" | "T" | 0.954 | 0.169 | "5:76431767" | -123942 | 0.056709 | 67 | 71 | 0.001872 | -0.332542 | 0.106435 | 0.023716 | -0.082929 | 0.036569 | 0.000332 | 0.424971 | 0.117684 | 5 | 76403285 | 76708132 | 1 | "IQGAP2" | "protein_coding… | 5 | 76431767 | "C" | "G" | 18349813
"AFR" | "Men" | "rs77969349" | 5 | 75980679 | "T/A" | 0.057 | 2.6 | 1.86 | 3.61 | 0.0 | 1252 | 0 | "IQGAP2" | "ENSG0000014570… | "5:75980679" | "A" | "T" | 0.954 | 0.169 | "5:76443590" | -112119 | 0.066294 | 78 | 83 | 0.001465 | -0.259234 | 0.081086 | 0.020412 | -0.085938 | 0.03696 | 0.000471 | 0.332048 | 0.094412 | 5 | 76403285 | 76708132 | 1 | "IQGAP2" | "protein_coding… | 5 | 76443590 | "G" | "A" | 18350111

I want to create a regional association plot and before this, I try to convert the table with this code:

data_gl = gl.Sumstats(data,
             snpid="variant_id",
             chrom="vch",
             pos="vbp",
             ea ="ALT",
             nea ="REF",
             eaf="af",
             beta="b_gi",
             se="b_gi_se",
             p="pval_gi",
             n="ma_samples", # Int32
             other=["tss_distance", "ma_count", 
                     "GENE", "EnsemblID"],
            build="hg19",
            verbose=True)

However, this throws an error:

----> [1](vscode-notebook-cell:/Users/usernaam/git/project/2.%20notebook.ipynb#Y214sZmlsZQ%3D%3D?line=0) data_gl = gl.Sumstats(data,
      [2](vscode-notebook-cell:/Users/usernaam/git/project/2.%20notebook.ipynb#Y214sZmlsZQ%3D%3D?line=1)              snpid="variant_id",
...
--> 330 sumstats = sumstats.rename(columns=rename_dictionary)
    332 ## if n was provided as int #####################################################################################
    333 if type(n) is int:

TypeError: rename() got an unexpected keyword argument 'columns'

The N column is Int64, whereas the ma_samples column is Int32.

How can I fix this?

Cloufield commented 8 months ago

Sorry for my late reply. I was on vacation last week. I will look into this error and let you know soon.

Cloufield commented 8 months ago

Hi,

I tested your sample data and it seems that gwaslab worked as expected. If you are loading from a file, you need to set the separator for your file using sep (default is tab), For build, please set build="19". (I will make it recognize the prefix "hg" in later version) And Int32 won't affect the loading process. Please let me know if the error still exists. Thanks.

  1. loading from a pandas dataframe

    image
  2. loading from a file

    image