AbbVie-ComputationalGenomics / SAIGEgds

Scalable Implementation of generalized mixed models using GDS files in Phenome-Wide Association Studies
7 stars 5 forks source link

From vcf to bgen to gds? dosage value or genotype probability value? #4

Closed complexgenome closed 3 years ago

complexgenome commented 3 years ago

Hi team,

I am interested in using this library for an admixed population with case-control imbalance. Data were imputed using TOPMED panel michigan imputation server. I would like to know during analysis are dosage information used or, genotype probability, or genotype value used from the input GDS file?

Also, what is an ideal way to perform data transformation without much hiccups to the GDS? Using PLINK (v2.0) I can convert imputed data to the .bgen format. Is "--export bgen-1.2" right file format when using PLINK?

thank you,

zhengxw-ab commented 3 years ago

See the help file:

seqAssocGLMM_SPA(gdsfile, modobj, maf=NaN, mac=10, missing=0.1, dsnode="",
    spa.pval=0.05, var.ratio=NaN, res.savefn="", res.compress="LZMA",
    parallel=FALSE, verbose=TRUE)

dsnode: "" for automatically searching the GDS nodes "genotype" and "annotation/format/DS", or use a user-defined GDS node in the file. (set dsnode="annotation/format/DS" to use imputed numeric dosages).

If your files were created by TOPMED panel michigan imputation server, use SeqArray::seqVCF2GDS() to convert VCF to GDS directly, that should include imputed dosages. In addition, use scenario="imputation" in SeqArray::seqVCF2GDS() to tell SeqArray that you are working on the imputed data.

complexgenome commented 3 years ago

Perfect.