MRCIEU / gwas2vcf

Convert GWAS summary statistics to VCF
MIT License
47 stars 18 forks source link

a dataset processed with gwas2vcf is different from the same dataset preprocessed by Opengwas #78

Open matteofloris opened 2 years ago

matteofloris commented 2 years ago

During an attempt to perform a colocalization test with gwasglue, I noticed that a dataset processed with gwas2vcf is different from the same dataset preprocessed by Opengwas. I give an example below.

1) download sumstats for GCST90001585 study:

wget http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90001001-GCST90002000/GCST90001585/GCST90001585_buildGRCh37.tsv.gz):

2) extract first 10.000 rows from sumstats (just for debugging)

zless GCST90001585_buildGRCh37.tsv.gz | head -n 10000 > test.csv

3) run main.py on the test sumstats:

/usr/bin/python3.8 main.py --data test.csv --json params.json --id test --ref human_g1k_v37.fasta --dbsnp dbsnp.v153.b37.vcf.gz --out $DIR/test.vcf

with the following params:

{
  "chr_col": 0,
  "pos_col": 1,
  "ea_col": 2,
  "oa_col": 3,
  "beta_col": 7,
  "se_col": 8,
  "ncontrol_col": 4,
  "pval_col": 9,
  "eaf_col": 6,
  "delimiter": "\t",
  "header": true,
  "build": "GRCh37"
}

4) with the Opengwas dataset I am able to perform colocalisation of this dataset against itself, while it is not possible to do the same process with the dataset generated by gwas2vcf, because I obtain this error:

> vres <- coloc::coloc.abf(vout[[1]], vout[[2]])
Error in check_dataset(d = dataset1, 1) : 
  dataset 1: MAF should be a numeric, strictly >0 & <1

To understand the reason, I tried to compare the vcf generated by gwas2vcf with the vcf available at https://gwas.mrcieu.ac.uk/datasets/ebi-a-GCST90001585/, for example at position 54421:

zless ebi-a-GCST90001585.vcf.gz | grep -v "#" | head -n 100| grep 54421
1   54421   rs146477069 A   G   .   PASS    AF=0.0019   ES:SE:LP:AF:ID  -0.3265:0.3052:0.54546:0.0019:rs146477069
zless test.vcf.gz| grep -v "#" | head -n 10 | grep 54421
1   54421   rs146477069 A   G   .   PASS    AF=0.0019   ES:SE:LP:AF:SS:ID   -0.3265:0.3052:0.54546:0.0019:3629:rs146477069

the only difference I see is the presence of AF and SS fields in the last column of the VCF. Do you have any explanation about this issue?

mcgml commented 2 years ago

Hi @matteofloris I'm sorry to hear you're having problems. The VCF looks fine. I think the MRCIEU/gwasglue package is parsing the values incorrectly. I have created an issue.

matteofloris commented 2 years ago

thank you so much Matthew!

Il giorno gio 30 giu 2022 alle ore 19:12 Matthew Lyon < @.***> ha scritto:

Hi @matteofloris https://github.com/matteofloris I'm sorry to hear you're having problems. The VCF looks fine. I think the MRCIEU/gwasglue package is parsing the values incorrectly. I have created an issue https://github.com/MRCIEU/gwasglue/issues/33.

— Reply to this email directly, view it on GitHub https://github.com/MRCIEU/gwas2vcf/issues/78#issuecomment-1171476997, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACZ5JNG2L5PL5PKI2KCFRGLVRXIPZANCNFSM52GPEJQQ . You are receiving this because you were mentioned.Message ID: @.***>

--

Matteo Floris, PhD Associate Professor - Medical Genetics Department of Biomedical Sciences University of Sassari (Italy) Tel +39.333.48.57.679 email: @., @.

Tutto quello che non so l'ho imparato a scuola (Leo Longanesi)