felixbur / nkululeko

Machine learning speaker characteristics
MIT License
26 stars 4 forks source link

automatically detect labels and bins #61

Open felixbur opened 10 months ago

felixbur commented 10 months ago

I guess it would be better if the labels would not need to be given explicitly but read from the datafile automatically. I meant that the labels [anger, disgust, happy...] are already in the data. Currently you have to tell nkululeko which labels to use, but if you want all, that shouldn't be necessary

For regression I would define default binning e.g. automatically assign three bins: (low, medium, high), and use the borders so they are equally distributed

bagustris commented 10 months ago

I meant that the labels [anger, disgust, happy...] are already in the data.

Yes, this should be the default if no labels is given in [DATA] section in the INI file. If there is the option of labels in the DATA section, the labels should use the defined labels.

For regression, we should actually treat it as real regression, i.e., predicting continuous score. Let's use examples from iemocap, msp-improv and msp-podcasts datasets. the format of data usually is "file, valence, arousal, dominance, naturalness," where the last four columns from valence to naturalness are continuous scores. The output should be continuos score. In this case label is required (name of header to predict).

Binning can be added too to map between regression to classification and provide further analysis.