AbbVie-ComputationalGenomics / SAIGEgds

Scalable Implementation of generalized mixed models using GDS files in Phenome-Wide Association Studies
7 stars 5 forks source link

Version of SAIGE used in SAIGEgds ? #3

Open ldcato opened 3 years ago

ldcato commented 3 years ago

Hi, this is a great idea, and really useful package. 2 quesions:

  1. I note from the README.md that this package is "based on the original SAIGE package (v0.29.4.4)". But I'm not certain if that means this package uses SAIGEv0.29.4.4. Does this mean it has not been updated as SAIGE has progressed forwards to 0.43.3? I can see a number of beneficial updates in the ChangeLog for SAIGE and I'm wondering if any of these have been implemented in this package as I can see there have been seperate improvements in this package (such as the use of ACAT-O, where SAIGE only allows SKAT-O).

  2. In SAIGE for gene-based testing there is an option minMAFforGRM to ensure only MAF>0.05 or 0.01 are used for creating the GRM. Is this automatically done in the SAIGEgds::seqFitNullGLMM_SPA() function? I can't see an argument for it in the man pages. Although, I see that the argument "variant.id" is used to select variants to be used for the GRM (after the pruning step in the tutorial), so an additional subset step on this character vector using allele frequency derived from the pruned gds could solve the problem? Please do let me know if I am missing something in this function! Thank you.

Thank you so much for your work on this!

zhengxw-ab commented 3 years ago
  1. SAIGE_v0.29.4.4 implements the method described in "Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. (2018)"; the higher version introduced SKAT SAIGE-gene; I have no plan to fully follow the development of SAIGE-GENE.
  2. SeqArray::seqSetFilterCond() can select the variants with MAF>0.05 or 0.01, and then SeqArray::seqGetData(f, "variant.id") to get the selected variant IDs.