GenoML / genoml2

GenoML (genoml2) is an open source Python package. It is an automated machine learning (autoML) platform for genomics data
Apache License 2.0
27 stars 17 forks source link

Refactor GenoML munging code #23

Open rockonrob opened 3 years ago

rockonrob commented 3 years ago

Refactors the munging code for more programmatic access.

This refactoring should help with readability/tracing of the munging code by separating the cli/loading/modifying functionality. Black was run on this code for auto formatting (some lines left unformatted). Most of the block-print statements have been unified into a single print statement so readers and users will see a very similar output.

Downstream, the main difference with this change is the removal of the addit_df&pheno_df&genotype_df from the dataForML.h5 file. This should reduce the overall data duplication and speed up munging times significantly.