Refactors the munging code for more programmatic access.
This refactoring should help with readability/tracing of the munging code by separating the cli/loading/modifying functionality. Black was run on this code for auto formatting (some lines left unformatted). Most of the block-print statements have been unified into a single print statement so readers and users will see a very similar output.
Downstream, the main difference with this change is the removal of the addit_df&pheno_df&genotype_df from the dataForML.h5 file. This should reduce the overall data duplication and speed up munging times significantly.
Refactors the munging code for more programmatic access.
This refactoring should help with readability/tracing of the munging code by separating the cli/loading/modifying functionality. Black was run on this code for auto formatting (some lines left unformatted). Most of the block-print statements have been unified into a single print statement so readers and users will see a very similar output.
Downstream, the main difference with this change is the removal of the
addit_df
&pheno_df
&genotype_df
from thedataForML.h5
file. This should reduce the overall data duplication and speed up munging times significantly.