Closed mikeDTI closed 4 years ago
The Code:
# Remove any columns with a standard deviation of zero
print(f"Removing any columns that have a standard deviation of 0 prior to Z-scaling...")
if any(addit_df.std() == 0.0):
print("")
print(f"Looks like there's at least one column with a standard deviation of 0. Let's remove that for you...")
addit_keep = addit_df.drop(addit_df.std()[addit_df.std() == 0.0].index.values, axis=1)
addit_keep_list = list(addit_keep.columns.values)
addit_df = addit_df[addit_keep_list]
addit_keep_list.remove('ID')
removed_list = np.setdiff1d(cols, addit_keep_list)
for removed_column in range(len(removed_list)):
print("")
print(f"The column {removed_list[removed_column]} was removed")
print("")
cols = addit_keep_list
Description: Munging has been updated to now perform a cursory glance if any columns have a standard deviation of 0 is found.
Given this information is not useful, and resorts in issues downstream, any column that has a standard deviation of 0 is removed moving forward, and the user is informed which columns were removed interactively as well.
Moved issue to new repo for completeness and consistency
Please make sure that this is a bug.
System information
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
GenoML installed from (source or binary): Source
GenoML version: 1.5!
Python version: 3.7, only real G's use that one
Describe the current behavior Will crash at Z scoring feature w/o variance
Describe the expected behavior Interesting people put non-variant columns into ML experiments??? Mary is going to put an extra condition at line 210 of munge that skips columns with standard deviations of 0 and leaves a snarky message ;-)
Code to reproduce the issue AMP PD transcriptomics
Other info / logs Ask MM