GenoML / genoml2

GenoML (genoml2) is an open source Python package. It is an automated machine learning (autoML) platform for genomics data
Apache License 2.0
27 stars 17 forks source link

Exporting a list of features that go into the .h5 file after munging #8

Closed m-makarious closed 4 years ago

m-makarious commented 4 years ago

Please make sure that this is a feature request.

System information

Suggested by @h-leonard (thanks, Hampton!)

Describe the feature and the current behavior/state. Currently, when munging, only a file with features and approximate relative importance is exported. This is only exported when using the --feature_selection flag, and features that contribute nothing are reported at the bottom with a score of 0.

...But what if you just wanted a list of features that made it to the *.dataForML.h5 file without having to open Python to read in and print out column names that remain in the final munged file?

Suggested feature request would output a list of features to a text file that is made automatically when munging, and updated when/if the --feature_selection flag is present

Will this change the current api? How? Nope!

Who will benefit with this feature? Everyone who needs to go through and make sure the right features are being used.

Any Other info. N/A

m-makarious commented 4 years ago

Now when munging is run, a *_list_features.txt file is generated.

*Example `_list_features.txt` file output:**

image

Example interactive output without --feature_selection flag:

image

Example interactive output with --feature_selection flag:

image

EDIT AUGUST 3: File created has been renamed*.list_features.txt

m-makarious commented 4 years ago

Moved issue to new repo for completeness and consistency