MTG / gaia

C++ library to apply similarity measures and classifications on the results of audio analysis, including Python bindings. Together with Essentia it can be used to compute high-level descriptions of music.
http://essentia.upf.edu
GNU Affero General Public License v3.0
271 stars 66 forks source link

"WARNING: Removing {id} from GroundTruth as it could not be found in the merged dataset" message for every file on every classification attempt #83

Closed wallacethefmh closed 6 years ago

wallacethefmh commented 6 years ago

I'm trying to run train_model on a very small dataset for initial testing, only 4 files, and I get this warning message for every file for every classification task. The ground truth is trying to train on instrumental or not instrumental, values of "0" or "1" (these files are provided below).

The program appears to start normally:

wboyd@dev-wboyd:~/essentia-orpheus$ python ~/gaia/src/bindings/pygaia/scripts/classification/train_model.py groundtruth_short.yaml filelist_short.yaml gaia-project-instrumental sig_dir results
Creating classification project gaia-project-instrumental
Successfully written gaia-project-instrumental
INFO     ClassificationTaskManager  |  Merging original base dataset...
[   INFO   ] Processing jobs number from 0 to 3 included (out of 4 without duplicate ids)
[   INFO   ] Will run using 2 threads
Merging file [4/4] (100% done)
[   INFO   ] All jobs finished, merging into dataset
Saving dataset...
Dataset successfully saved!
Preprocessing dataset chunk for /devhomes/wboyd/essentia-orpheus/sig_dir/datasets/training.db_0_4.partdb...

But then it starts each classification task and output this warning for every file, every time:

Your dataset has been saved at /devhomes/wboyd/essentia-orpheus/sig_dir/datasets/training.db
INFO     ClassificationTaskManager  |  Original dataset successfully created!
INFO     ClassificationTaskManager  |  Doing "raw" preprocessing...
INFO     ClassificationTaskManager  |  Doing "nobands" preprocessing...
INFO     ClassificationTaskManager  |  Doing "normalized" preprocessing...
INFO     ClassificationTaskManager  |  Doing "basic" preprocessing...
INFO     ClassificationTaskManager  |  Doing "lowlevel" preprocessing...
INFO     ClassificationTaskManager  |  Doing "gaussianized" preprocessing...
INFO     ClassificationTaskManager  |  --------------------------------------------------------------------------------
INFO     ClassificationTaskManager  |  Setup finished, starting classification tasks.
INFO     ClassificationTaskManager  |  Will use 2 concurrent jobs.
WARNING  ClassificationTask  |  Removing 1000 from GroundTruth as it could not be found in the merged dataset
WARNING  ClassificationTask  |  Removing 1000 from GroundTruth as it could not be found in the merged dataset
WARNING  ClassificationTask  |  Removing 1 from GroundTruth as it could not be found in the merged dataset
WARNING  ClassificationTask  |  Removing 1 from GroundTruth as it could not be found in the merged dataset
WARNING  ClassificationTask  |  Removing 10 from GroundTruth as it could not be found in the merged dataset
WARNING  ClassificationTask  |  Removing 1003 from GroundTruth as it could not be found in the merged dataset
WARNING  ClassificationTask  |  Removing 10 from GroundTruth as it could not be found in the merged dataset
WARNING  ClassificationTask  |  Removing 1003 from GroundTruth as it could not be found in the merged dataset
INFO     ClassificationTask  |  Running evaluation 0 for: /devhomes/wboyd/essentia-orpheus/sig_dir/results/training1942522285513231884 with classifier svm and dataset gaussianized
INFO     ClassificationTask  |      PID: 12310, parameters: {"kernel": "RBF", "C": 11, "preprocessing": "gaussianized", "type": "C-SVC", "classifier": "svm", "gamma": -11}
INFO     ClassificationTask  |  Running evaluation 0 for: /devhomes/wboyd/essentia-orpheus/sig_dir/results/training-308123546098320560 with classifier svm and dataset gaussianized
INFO     ClassificationTask  |      PID: 12311, parameters: {"kernel": "RBF", "C": 11, "preprocessing": "gaussianized", "type": "C-SVC", "classifier": "svm", "gamma": -9}
ERROR    ClassificationTask  |  While doing evaluation with param = {'kernel': 'RBF', 'C': 11, 'preprocessing': 'gaussianized', 'type': 'C-SVC', 'classifier': 'svm', 'gamma': -9}
evaluation = [{'type': 'nfoldcrossvalidation', 'nfold': 5}]
ERROR    ClassificationTask  |  While doing evaluation with param = {'kernel': 'RBF', 'C': 11, 'preprocessing': 'gaussianized', 'type': 'C-SVC', 'classifier': 'svm', 'gamma': -11}
evaluation = [{'type': 'nfoldcrossvalidation', 'nfold': 5}]
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/gaia2/classification/classificationtask.py", line 195, in <module>
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/gaia2/classification/classificationtask.py", line 195, in <module>
    task.run(className, outfilename, param, dsname, gtname, evalconfig)
  File "/usr/local/lib/python2.7/dist-packages/gaia2/classification/classificationtask.py", line 177, in run
        confusion = evaluateNfold(evalparam['nfold'], ds, gt, trainerFun, **trainingparam)
task.run(className, outfilename, param, dsname, gtname, evalconfig)
  File "/usr/local/lib/python2.7/dist-packages/gaia2/classification/classificationtask.py", line 177, in run
  File "/usr/local/lib/python2.7/dist-packages/gaia2/classification/evaluation.py", line 116, in evaluateNfold
    classifier = trainingFunc(trainds, traingt, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/gaia2/classification/classifier_SVM.py", line 40, in train_SVM
        p[groundTruth.className] = groundTruth[p.name()]
confusion = evaluateNfold(evalparam['nfold'], ds, gt, trainerFun, **trainingparam)
KeyError: '1'
  File "/usr/local/lib/python2.7/dist-packages/gaia2/classification/evaluation.py", line 116, in evaluateNfold
    classifier = trainingFunc(trainds, traingt, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/gaia2/classification/classifier_SVM.py", line 40, in train_SVM
    p[groundTruth.className] = groundTruth[p.name()]
KeyError: '1'
WARNING  ClassificationTask  |  Removing 1000 from GroundTruth as it could not be found in the merged dataset
WARNING  ClassificationTask  |  Removing 1 from GroundTruth as it could not be found in the merged dataset

etc. repeating with the warnings and errors until it finishes

The project file is autogenerated, I have not modified it. I cloned the master repo at the time of this writing.

groundtruth:

wboyd@dev-wboyd:~/essentia-orpheus$ cat groundtruth_short.yaml
className: "training"
groundTruth: {1: "0", 10: "0", 1000: "0", 1003: "0"}
type: "singleClass"
version: "1.0

filelist:

wboyd@dev-wboyd:~/essentia-orpheus$ cat filelist_short.yaml
{1: "sig_dir/D/Dean_Martin/Amore/test.wav.sig",
  10: "sig_dir/D/Dean Martin/Amore/10 All I Do Is Dream of You.wav.sig",
  1000: "sig_dir/C/Carpenters/The Carpenters_ The Singles 1969-1981 (R/04 We've Only Just Begun.m4a.sig",
  1003: "sig_dir/C/Compilations/The Complete Motown Singles Vol. 7_ 1967/2-19 All I Need.m4a.sig"}

I would be very grateful for some guidance, thanks in advance.

wallacethefmh commented 6 years ago

In case someone else ever has this issue...

I figured this out after lot of tinkering. I changed my ids to parse as string instead of numbers (I'm guessing that was the issue) in the filelist and groundtruth:

{test/1: "sig_dir/D/Dean_Martin/Amore/test.wav.sig",
  test/10: "sig_dir/D/Dean Martin/Amore/10 All I Do Is Dream of You.wav.sig",
  test/1000: "sig_dir/C/Carpenters/The Carpenters_ The Singles 1969-1981 (R/04 We've Only Just Begun.m4a.sig",
  test/1003: "sig_dir/C/Compilations/The Complete Motown Singles Vol. 7_ 1967/2-19 All I Need.m4a.sig"}

and

className: "training"
groundTruth: {test/1: "0", test/10: "0", test/1000: "0", test/1003: "0"}
type: "singleClass"
version: "1.0