GELOG / adam-ibs

Ports the IBS/MDS/IBD functionality of Plink to Spark / ADAM
Apache License 2.0
3 stars 6 forks source link

Association analysis #28

Open davidonlaptop opened 9 years ago

davidonlaptop commented 9 years ago

Description

This feature adds the --assoc, --model, --fisher, --linear, --logistic, --ci, --counts, --fisher, --cell, --within, --mh, --mh2, --bd, --homog, --gene-drop, --T2, --qt-means, --gxe, --covar, --reference-allele, --beta, --standard-beta, --genotypic, --hethom, --dominant, --recessive option(s) based on the input file described in #2.

This feature also generates the following file formats: ASSOC, FISHER, MODEL, BEST.PERM, BEST.MPERM, GEN.PERM, TREND.PERM, DOM.PERM, REC.PERM, CMH, CMH2, HOMOG, T2.PERM, T2.MPERM, QASSOC, ASSOC.PERM, ASSOC.MPERM, QASSOC.MEANS, QASSOC.GXE, ASSOC.LINEAR, ASSOC.LOGISTIC. For more info, check: http://pngu.mgh.harvard.edu/~purcell/plink/anal.shtml

Scope

The scope of this feature may be too large to implement completely during MGL804. However, many of the file formats seems to share a similar structure... Anyway, the scope needs to be refined with Beatriz.

From a development point of view, it is interesting to note that feature is very distinct from the rest of the other features, so it would be easy to implement in parallel to the other features.

Note: I stopped listing the options and file formats at the section Covariates and interactions.

Analysis

Add a comment to this issue with:

Add a comment to this issue describing how this will be implemented in Spark, and how it differs from plink.

Also update the class diagram on the wiki page describing PLink formats (when incomplete) and add a class diagram describing the models implemented in Scala for this feature on the wiki page on the MGL804 formats.

Implementation

The implementation should integrates with the models implemented in scala by this project and use: