GenoML / genoml2

GenoML (genoml2) is an open source Python package. It is an automated machine learning (autoML) platform for genomics data
Apache License 2.0
27 stars 17 forks source link

Uniform set of tag SNPs for distributed learning. #2

Open mikeDTI opened 5 years ago

mikeDTI commented 5 years ago

Please make sure that this is a feature request.

System information

Describe the feature and the current behavior/state. Mike will make a uniform set of easily imputable tag SNPs to help with feature selection in distributed learning environments across data silos.

Will this change the current api? How? Additional download / filter.

Who will benefit with this feature? Anyone running distributed learning with common SNPs.

Any Other info. Pending as part of new "neuroGlobo" array project. Based on TopMed and implemented with TagIt, but will also include N370S and G2019S because Parkinson's disease PoC. Wojcik-ImputationAwareTagSNPSelection.pdf

mikeDTI commented 3 years ago

Thinking we just make this into NBA/GDA SNPs that tag > 2 population ancestries well?

m-makarious commented 3 years ago

We could consider an additional directory with useful files such as this and perhaps PRS betas, GWAS summary stats in the correct format, among others?