MTG / gaia

C++ library to apply similarity measures and classifications on the results of audio analysis, including Python bindings. Together with Essentia it can be used to compute high-level descriptions of music.
http://essentia.upf.edu
GNU Affero General Public License v3.0
271 stars 66 forks source link

Add weight parameter for training C-SVC SVMs on unbalanced data #18

Open dbogdanov opened 9 years ago

dbogdanov commented 9 years ago

Libsvm supports working with unbalanced data via adding class weights. Add a new parameter for SVM configuration which will allow to configure weights for each class. Currently the weights are unassigned by default, and there is not way to configure them.

Make sure that cross-validation is not affected by unbalanced data as well.

palonso commented 5 years ago

Implemented in 3c2ca55 (PR #86)

However when you say,

Make sure cross-validation is not affected by unbalanced data

the expected behavior is not clear for me. My intuition was to compute the weights for each fold and not globally for whole the dataset as this way the SVM input will always be balanced. Do you agree?

dbogdanov commented 5 years ago

That is correct. However, we can also consider doing stratified splits (e.g., see sklearn) as curently we just randomly split all the data disregarding the distribution of labels. What do you think?