MTG / gaia

C++ library to apply similarity measures and classifications on the results of audio analysis, including Python bindings. Together with Essentia it can be used to compute high-level descriptions of music.
http://essentia.upf.edu
GNU Affero General Public License v3.0
272 stars 66 forks source link

Gai SVM weird Accuracy scores #80

Closed loretoparisi closed 5 years ago

loretoparisi commented 6 years ago

I'm using the High Level Audio feature extraction plus Gaia classifiers, so using the built-in SVM classifiers. I'm getting weird results for my audio dataset, for the different classifications like voice_instrumental, etc. I'm not sure if this is due to the audio input format.

This is how my audio stream looks like from ffmpeg

{
              "index": 0,
              "codec_name": "mp3",
              "codec_long_name": "MP3 (MPEG audio layer 3)",
              "codec_type": "audio",
              "codec_time_base": "1/44100",
              "codec_tag_string": "[0][0][0][0]",
              "codec_tag": "0x0000",
              "sample_fmt": "fltp",
              "sample_rate": "44100",
              "channels": 2,
              "channel_layout": "stereo",
              "bits_per_sample": 0,
              "r_frame_rate": "0/0",
              "avg_frame_rate": "0/0",
              "time_base": "1/14112000",
              "start_pts": 353600,
              "start_time": "0.025057",
              "duration_ts": 3707412480,
              "duration": "262.713469",
              "bit_rate": "128000"
              "tags": {
                "encoder": "Lavc58.18"
              }
            }
dbogdanov commented 6 years ago

Can you give more details? What exactly is weird in the results? You can see the expected accuracy of the models here: http://acousticbrainz.org/datasets/accuracy

loretoparisi commented 6 years ago

@dbogdanov hello! So an example is that for non english songs, in most of cases, I often get a instrumental value at very high accuracy (>0.9), while the track is not instrumental. That is way my wonder was if I'm wrong with the input file format.

dbogdanov commented 5 years ago

Can't say for sure, but this may be the case of the robustness issues, or that the dataset we trained isn't covering non-english vocal music well enough.