AbbVie-ComputationalGenomics / SAIGEgds

Scalable Implementation of generalized mixed models using GDS files in Phenome-Wide Association Studies
7 stars 5 forks source link

Error: is not a SeqArray GDS file || kinship using genotyped data #5

Closed complexgenome closed 3 years ago

complexgenome commented 3 years ago

create GRM in .gds file format


library(SNPRelate)

bed.fn<-"mafed_LDed.bed"
fam.fn<-"mafed_LDed.fam"
bim.fn<-"mafed_LDed.bim"

# convert
snpgdsBED2GDS(bed.fn, fam.fn, bim.fn, "LDed_kinship.gds")

# open
genofile <- openfn.gds("LDed_kinship.gds")
genofile

# output to a GDS file
snpgdsGRM(genofile, method="GCTA", out.fn="GDS_LDed_kinship.gds")
# close
closefn.gds(genofile)

Fit model

library(SeqArray)
library(SAIGEgds)

pheno <-read.table("phenotype_PCcleaned_04.28.2021.txt", header=TRUE)

glmm <- seqFitNullGLMM_SPA(PHENO_FINAL ~ as.factor(SEX)+ AGE + PC1 +PC2+ PC3 , pheno, "GDS_LDed_kinship.gds" , trait.type="binary",sample.col="TOPMED_IID", num.thread=10)

Error:

SAIGE association analysis:
Wed Apr 28 14:41:43 2021
Open 'LDed_kinship.gds'
Error in seqOpen(gdsfile) :
  'LDed_kinship.gds' is not a SeqArray GDS file ('FileFormat' should be 'SEQ_ARRAY').

Resource: https://uw-gac.github.io/topmed_workshop_2017/computing-a-grm.html

I understand I do not have SEQ data, but genotyped data. Error in seqOpen(gdsfile) cannot work on genotyped data. I would not like to convert genotype data into vcf and re-run complete pipeline.

Any points to proceed further?

zhengxw-ab commented 3 years ago

You misuse SAIGEgds, please see: http://www.bioconductor.org/packages/release/bioc/vignettes/SAIGEgds/inst/doc/SAIGEgds.html#preparing-snp-data-for-genetic-relationship-matrix SAIGEgds does not use the resulting GDS from snpgdsGRM().

And you could use seqSNP2GDS() in SeqArray to convert SNPRelate GDS to SeqArray GDS.

complexgenome commented 3 years ago

Thank you. I updated my pipeline as:

library(SNPRelate)

bed.fn<-"updated_mafed.bed"
fam.fn<-"updated_mafed.fam"
bim.fn<-"updated_mafed.bim"

#convert
snpgdsBED2GDS(bed.fn, fam.fn, bim.fn, "MAFed_kinship.gds") ## create gds from SNP data

seqSNP2GDS("MAFed_kinship.gds", "SEQ_MHAS.gds") ## convert SNP gds to seq array

gds <- seqOpen( "SEQ_MHAS.gds") #open 
snpset <- snpgdsLDpruning(gds)

snpset.id <- unlist(snpset, use.names=FALSE)  # get the variant IDs of a LD-pruned set
head(snpset.id)

grm_fn <- "seqarray_grm_geno.gds" # output file name 
seqSetFilter(gds, variant.id=snpset.id)

##export to a GDS genotype file without annotation data
seqExport(gds, grm_fn, info.var=character(), fmt.var=character(), samp.var=character())

seqClose(gds)

I have following questions: 1- Is this workflow correct?

2- is the GRM in .gds read and fitted into the null model as a square matrix?

http://www.bioconductor.org/packages/release/bioc/vignettes/SAIGEgds/inst/doc/SAIGEgds.html#preparing-snp-data-for-genetic-relationship-matrix

zhengxw-ab commented 3 years ago
#convert
snpgdsBED2GDS(bed.fn, fam.fn, bim.fn, "MAFed_kinship.gds") ## create gds from SNP data
seqSNP2GDS("MAFed_kinship.gds", "SEQ_MHAS.gds") ## convert SNP gds to seq array

Please use SeqArray::seqBED2GDS()directly convert the PLINK dataset to SeqArray GDS:

seqBED2GDS(bed.fn, fam.fn, bim.fn, out.gdsfn, compress.geno="LZMA_RA",
    compress.annotation="LZMA_RA", chr.conv=TRUE, optimize=TRUE, digest=TRUE,
    parallel=FALSE, verbose=TRUE)