Closed felixbur closed 4 months ago
Yes. As you suggest, It will be easier to combine several results from previous experiments with new module.
e.g.,
python3 -m nkululeko.ensemble result1 result2 result3
Where result1, result2, and result3 are the name of [EXP][name]. In the case the results are not in current directory, e.g., in tmp or others, the full command will be,
python3 -m nkululeko.ensemble /tmp/result1 /tmp/result2 /path/to/result3
agreed, that seems like a very good idea, as several modules (i.e. augtrain) already do single nkululeko.nkululeko calls
@felixbur sounds good!
So, I use INI file from previous experiments to calculate new prediction (methods: majority_voting, mean, max, and sum). This requires prediction file with probability as inputs. The problem that it seems we (or maybe my self only) didn't have that CSV file. I only have it when experimenting wtih ravdess with os + xgb and praat + xvm. Example is train_test_dev_svm_praat_scale-standard.pkl.csv
in store
dir below.
file angry happy neutral sad predicted
0 ./Actor_21/03-01-07-01-01-01-21.wav 0.520500 0.103500 0.208000 0.167500 angry
1 ./Actor_21/03-01-06-01-02-02-21.wav 0.426500 0.097000 0.200000 0.276000 angry
2 ./Actor_21/03-01-06-02-01-02-21.wav 0.450000 0.106000 0.208500 0.235000 angry
3 ./Actor_21/03-01-04-02-01-02-21.wav 0.454000 0.098500 0.208000 0.239000 angry
4 ./Actor_21/03-01-01-01-01-02-21.wav 0.415500 0.121500 0.201500 0.262000 angry
.. ... ... ... ... ... ...
235 ./Actor_24/03-01-03-02-01-01-24.wav 0.496500 0.101500 0.200500 0.201000 angry
236 ./Actor_24/03-01-08-02-02-01-24.wav 0.429000 0.104000 0.192000 0.275000 angry
237 ./Actor_24/03-01-08-01-01-02-24.wav 0.403000 0.115500 0.195500 0.286000 angry
238 ./Actor_24/03-01-03-01-01-02-24.wav 0.421000 0.115500 0.202500 0.261000 angry
239 ./Actor_24/03-01-08-02-01-02-24.wav 0.461500 0.100000 0.200500 0.237500 angry
Example of use is below,
$ python3 -m nkululeko.ensemble bagus_tests/exp_ravdess_os_xgb.ini bagus_tests/exp_ravdess_praat_svm.ini --method mean
DEBUG ensemble: running exp_ravdess_os_xgb from config bagus_tests/exp_ravdess_os_xgb.ini, nkululeko version 0.86.7
Loading predictions from ./bagus_tests/results/exp_ravdess_os_xgb/./store//train_test_dev_xgb_os_scale-standard.pkl.csv
DEBUG ensemble: running exp_ravdess_praat_svm from config bagus_tests/exp_ravdess_praat_svm.ini, nkululeko version 0.86.7
Loading predictions from ./bagus_tests/results/exp_ravdess_praat_svm/./store//train_test_dev_svm_praat_scale-standard.pkl.csv
Ensemble predictions saved to: ensemble_voting.csv
Ensemble done, used 0.01 seconds
DONE
The working code is in my branch ensemble. I think that CSV files should be always returned after each experiment right (it is different from save_test
) ? So, the codes will work regardless dataset type (audformat, csv) or other parameters, but currently only for classification.
I think that CSV files should be always returned after each experiment right
sure, fine with me. I guess it makes sense that this module requires additional functionality from other module, i.e. the nkululeko and augtrain modules
Currently, opposed to features, there is no way to combine models. Easiest way would be to try late fusion, i.e. take the output of several models as input of a "meta model"