HunterMcGushion / hyperparameter_hunter

Easy hyperparameter optimization and automatic result saving across machine learning algorithms and libraries
MIT License
704 stars 100 forks source link

How can I set class weights in a multiclass classification with imbalance dataset? #183

Open alegarbed opened 5 years ago

alegarbed commented 5 years ago

I had difficulties to implement different class weights in a multiclass classification. The proper way to set a class weight is in a dictionary but I can just use with parameters: Real, Integer and Categorical. Are there any solution? Can you provide a simply example? Thank you in advance.

HunterMcGushion commented 5 years ago

Thanks for opening this, @alegarbed! Yes, you can optimize class_weight values! Here's a basic example with SKLearn's RandomForestClassifier and the Iris dataset.

from hyperparameter_hunter import Environment, CVExperiment
from hyperparameter_hunter import BayesianOptPro, Integer, Categorical
from hyperparameter_hunter.utils.learning_utils import get_iris_data
from sklearn.ensemble import RandomForestClassifier

env = Environment(
    train_dataset=get_iris_data(),
    results_path="HyperparameterHunterAssets",
    target_column="species",
    metrics=["hamming_loss"],
    cv_params=dict(n_splits=5, random_state=32),
)

# Just a reference for normal `class_weight` usage outside of optimization
exp = CVExperiment(
    RandomForestClassifier, {"n_estimators": 10, "class_weight": {0: 1, 1: 1, 2: 1}}
)

opt = BayesianOptPro(iterations=10, random_state=32)
opt.forge_experiment(
    model_initializer=RandomForestClassifier,
    model_init_params=dict(
        #################### LOOK DOWN ####################
        class_weight={
            0: Categorical([1, 3]),
            1: Categorical([1, 4]),
            2: Integer(1, 9),  # You can also use `Integer` for low/high ranges
        },
        #################### LOOK UP ####################
        criterion=Categorical(["gini", "entropy"]),
        n_estimators=Integer(5, 100),
    ),
)
opt.go()

This should definitely be included in one of our examples, or at least documented, so thanks for asking again!

Side note: I just noticed that the automatic Experiment matching during optimization isn't working for this, which is a bug, so I'll look into that and update you