The implementation should integrates with the models implemented in scala by this project and use:
Scala
Spark RDD
Spark MLlib / GraphX (if appropriate)
Important note: The model can be only in memory for now, but you'll need to integrate into the ADAM format later on. You'll probably need to create a new record type.
Description
This feature adds the
--neighbour
option(s) based on the input file described in #3. For more info, check: http://pngu.mgh.harvard.edu/~purcell/plink/strat.shtml#outlierAnalysis
Add a comment to this issue with:
NN
MIN_DST
Z
PROP_DIFF
Design
Add a comment to this issue describing how this will be implemented in Spark, and how it differs from plink.
Also update the class diagram on the wiki page describing PLink formats (when incomplete) and add a class diagram describing the models implemented in Scala for this feature on the wiki page on the MGL804 formats.
Implementation
The implementation should integrates with the models implemented in scala by this project and use:
Important note: The model can be only in memory for now, but you'll need to integrate into the ADAM format later on. You'll probably need to create a new record type.