felixbur / nkululeko

Machine learning speaker characteristics
MIT License
32 stars 5 forks source link

ensemble #103

Closed felixbur closed 4 months ago

felixbur commented 10 months ago

Currently, opposed to features, there is no way to combine models. Easiest way would be to try late fusion, i.e. take the output of several models as input of a "meta model"

bagustris commented 5 months ago

Yes. As you suggest, It will be easier to combine several results from previous experiments with new module.

e.g.,

python3 -m nkululeko.ensemble result1 result2 result3

Where result1, result2, and result3 are the name of [EXP][name]. In the case the results are not in current directory, e.g., in tmp or others, the full command will be,

python3 -m nkululeko.ensemble /tmp/result1 /tmp/result2 /path/to/result3
felixbur commented 5 months ago

agreed, that seems like a very good idea, as several modules (i.e. augtrain) already do single nkululeko.nkululeko calls

bagustris commented 5 months ago

@felixbur sounds good!

So, I use INI file from previous experiments to calculate new prediction (methods: majority_voting, mean, max, and sum). This requires prediction file with probability as inputs. The problem that it seems we (or maybe my self only) didn't have that CSV file. I only have it when experimenting wtih ravdess with os + xgb and praat + xvm. Example is train_test_dev_svm_praat_scale-standard.pkl.csv in store dir below.

file     angry     happy   neutral       sad predicted
0    ./Actor_21/03-01-07-01-01-01-21.wav  0.520500  0.103500  0.208000  0.167500     angry
1    ./Actor_21/03-01-06-01-02-02-21.wav  0.426500  0.097000  0.200000  0.276000     angry
2    ./Actor_21/03-01-06-02-01-02-21.wav  0.450000  0.106000  0.208500  0.235000     angry
3    ./Actor_21/03-01-04-02-01-02-21.wav  0.454000  0.098500  0.208000  0.239000     angry
4    ./Actor_21/03-01-01-01-01-02-21.wav  0.415500  0.121500  0.201500  0.262000     angry
..                                   ...       ...       ...       ...       ...       ...
235  ./Actor_24/03-01-03-02-01-01-24.wav  0.496500  0.101500  0.200500  0.201000     angry
236  ./Actor_24/03-01-08-02-02-01-24.wav  0.429000  0.104000  0.192000  0.275000     angry
237  ./Actor_24/03-01-08-01-01-02-24.wav  0.403000  0.115500  0.195500  0.286000     angry
238  ./Actor_24/03-01-03-01-01-02-24.wav  0.421000  0.115500  0.202500  0.261000     angry
239  ./Actor_24/03-01-08-02-01-02-24.wav  0.461500  0.100000  0.200500  0.237500     angry

Example of use is below,

$ python3 -m nkululeko.ensemble bagus_tests/exp_ravdess_os_xgb.ini bagus_tests/exp_ravdess_praat_svm.ini --method mean 
DEBUG ensemble: running exp_ravdess_os_xgb from config bagus_tests/exp_ravdess_os_xgb.ini, nkululeko version 0.86.7                                                                    
Loading predictions from ./bagus_tests/results/exp_ravdess_os_xgb/./store//train_test_dev_xgb_os_scale-standard.pkl.csv                                                                
DEBUG ensemble: running exp_ravdess_praat_svm from config bagus_tests/exp_ravdess_praat_svm.ini, nkululeko version 0.86.7                                                              
Loading predictions from ./bagus_tests/results/exp_ravdess_praat_svm/./store//train_test_dev_svm_praat_scale-standard.pkl.csv                                                          
Ensemble predictions saved to: ensemble_voting.csv                                                                                                                                     
Ensemble done, used 0.01 seconds                                                                                                                                                       
DONE  

The working code is in my branch ensemble. I think that CSV files should be always returned after each experiment right (it is different from save_test) ? So, the codes will work regardless dataset type (audformat, csv) or other parameters, but currently only for classification.

felixbur commented 5 months ago

I think that CSV files should be always returned after each experiment right

sure, fine with me. I guess it makes sense that this module requires additional functionality from other module, i.e. the nkululeko and augtrain modules