The ATOM Modeling PipeLine (AMPL) is an open-source, modular, extensible software pipeline for building and sharing models to advance in silico drug discovery.
MIT License
136
stars
67
forks
source link
xgboost and RF models should support class balancing weights in loss function #318
The xgboost, RF and NN models all have different ways to handle imbalanced classification datasets by using class-specific weights in their loss functions; but we currently only support this for NN models, by setting the weight_transform_type parameter to 'balancing'. We should add this capability for random forests and xgboost models as well. For RF models this means setting the class_weights parameter to 'balanced' when we create the RandomForestClassifier. For xgboost models you do it by setting the scale_pos_weight parameter to sum(negative instances) / sum(positive instances).
The xgboost, RF and NN models all have different ways to handle imbalanced classification datasets by using class-specific weights in their loss functions; but we currently only support this for NN models, by setting the weight_transform_type parameter to 'balancing'. We should add this capability for random forests and xgboost models as well. For RF models this means setting the class_weights parameter to 'balanced' when we create the RandomForestClassifier. For xgboost models you do it by setting the scale_pos_weight parameter to sum(negative instances) / sum(positive instances).