Programmatically figure out which class is the minority
Downsample the majority classes by randomly selecting count of minority class * max_imbalance number of samples from those classes. Reset the index of labels and features if necessary. Throw out the indices that are not in that one
Reference the lines surrounding np.issubdtype related to bin_count for help determining categorical vs continuous labels
Background
Splitset.make
is where the sample indices that make up the splits are defined.Problem
When labels are not balanced, the network gets biases toward the majority classes and performs poorly on minority classes.
Need a way to downsample majority classes in order to balance categorical labels prior to split creation
Solution
max_imbalance:float
(greater than 1.0) forSplitset.make
Balance
section in this gist: https://gist.github.com/aiqc/d8d4b5e74a8811b3d8657c65cb3c6e7fcount of minority class * max_imbalance
number of samples from those classes. Reset the index of labels and features if necessary. Throw out the indices that are not in that onenp.issubdtype
related tobin_count
for help determining categorical vs continuous labels