GenoML / genoml2

GenoML (genoml2) is an open source Python package. It is an automated machine learning (autoML) platform for genomics data
Apache License 2.0
27 stars 17 forks source link

--adjust_data crashes if features were removed during munging in earlier stages #19

Closed m-makarious closed 3 years ago

m-makarious commented 3 years ago

Please make sure that this is a bug.

System information:

Describe the current behavior: Currently, munging removes features prior to Z-scoring if they have a standard deviation of 0 or remove SNPs if the --gwas and --p flags are indicated as well, so when it is time to adjust the data by the target features indicated by the user in the .txt file, some of those features were dropped earlier - causing GenoML to crash

Describe the expected behavior: Shouldn't crash - should keep going. Easy fix, compare the incoming .txt to the munged dataframe and adjust only by the features still remaining.

Code to reproduce the issue: Provide a reproducible test case that is the bare minimum necessary to generate the problem. Might not happen to everyone, especially if the dataset is small enough or only significant features are kept. See above for how we came across this

Other Information / Logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

m-makarious commented 3 years ago

When adjusting the data now, there's a check at the top looking at intersecting features between the incoming target list and the munged dataset, ensuring only features that are left are the ones getting adjusted for.

Code below: image