fkubota / kaggle-Cornell-Birdcall-Identification

Cornell Birdcall Identification コンペのリポジトリ
MIT License
38 stars 19 forks source link

audio tagging と、sedの違いについてまとめる #85

Closed fkubota closed 4 years ago

fkubota commented 4 years ago

SED task is different from the tasks in past audio competitions in kaggle. The task in Freesound Audio Tagging 2019 or Freesound General-Purpose Audio Tagging Challenge is Audio Tagging, which we'll need to provide clip level prediction, and the task in TensorFlow Speech Recognition Challenge is Speech Recognition, so what we need to predict is which speech command is in that audio clip (which is in a sense similar to Audio Tagging task, because we only need to provide clip level prediction).

In this competition, what we need to provide is 5sec chunk level prediction for site_1 and site_2 data, and clip level prediction for site_3 data. Chunk level prediction can be treated as audio tagging task if we treat each chunk as short audio clip, but we can also use SED approach.