Annotation tasks brainstorming for FSD

MTG / freesound-datasets

A platform for the collaborative creation of open audio collections labeled by humans and based on Freesound content.

https://annotator.freesound.org/

GNU Affero General Public License v3.0

135 stars 11 forks source link

Annotation tasks brainstorming for FSD #35

Open edufonseca opened 7 years ago

edufonseca commented 7 years ago

Currently, we have only one task available for FSD: a validation task, which consists of validating annotations that were automatically generated through a mapping. More specifically, for a given sound category, the rater is prompted with the question: Is sound category present in the following sounds? .

While at the moment this is the only task, hopefully we will have more in the future. This issue is to brainstorm about other possible tasks of interest. For example:

Define timestamps (start and end times, or onset and offsets) for the instances of acoustic events within an audio sample. The validation task that we already have allows us to evaluate the presence of a sound category in an audio sample, but in many cases the samples are relatively long (up to 90s) and thus we have no knowledge of when exactly the type of sound occurs. These cases can be referred to as weakly labeled data. Defining exact timestamps will turn these cases into strongly labeled data while enabling evaluation for other tasks, e.g., detection of acoustic events on a continuous audio stream.

jordipons commented 7 years ago

For some ideas in how to define more tasks, it can be inspiring this article from ImageNet authors: https://arxiv.org/pdf/1409.0575.pdf

Image classification annotation process in the paper is similar to validation task.

And object detection annotation process in the paper is similar to define timestamps for the instances of acoustic events within an audio sample.

Furthermore, they discuss the case of single-object localization as a simpler proxy for the object detection task. Apparently, this helped them understanding how to better approach the object detection task annotation process (with many objects in the same class, and therefore much more challenging).

ffont commented 7 years ago

We could probably include a single-resource annotation task in which users have to manually annotate a single audio using the concepts of the ontology. This would be more of a "generation" task, not really validation, but in some particular cases could be useful so it could be interesting to keep it in the roadmap. Also this could be aligned with one of the AudioCommons deliverables that we have to work on.

xavierfav commented 7 years ago

So far, we proposed two tasks that would be interesting to have in our platform:

A single-ressource annotation task in which users annotate a single audio using concepts of the ontology (proposed by ffont). We could use our text-based dnn classifier which returns probabilities of an audio clip belonging to a category (sergiooramas's models) to guide the user. We could propose the most likely concepts to the user for the audio clip he is dealing with to speed up the process and avoid the user having to search in all the ontology.
A task in which users define timestamps of acoustic events within an audio clip. Again we can propose to the user the more likely concepts or/and use the annotations validated in our current validation task. We can take inspiration of the similarity annotator made by oriolromani for the front end, based on wavesurfer.js.

ffont commented 7 years ago

Regarding the first task, look at chapter 6 of my thesis as this si exactly what I was doing (although I'm not happy with the result as it should be easier for users). This task we have to develop it for one of the AudioCommons deliverables so we should give it significant priority ;).

edufonseca commented 7 years ago

In the current validation task, we seek annotator agreement to build ground truth for the dataset. How can we do this for the proposed new tasks?

Considering the two aforementioned tasks, that are more about "generation", i.e., the user generates data or annotations, either defining timestamps and/or assigning labels to files or events: Is majority voting feasible to provide useful answers in this cases?

xavierfav commented 7 years ago

For the first task, we should check ffont thesis, see if there is some info. What I see in this case is that adding a label to a sound clip could be like voting for an annotation. So we could combine with the current validation task to consider some annotations as ground truth. Or, we could just propose the same sound to different people, and the ground truth are the labels that everyone added to the sound clip.

Regarding the second task, I've been thinking about it a bit and I think we can again rely on redundancy of answers. Or we could propose a task for validating/correcting timestamps. So the ground truth could be generated in several gereration/correction steps from humans.

ffont commented 7 years ago

For the first task, in my thesis I did something very similar (see chapter 6). I think what you propose is ok but bear in mind that the system should not rely on having more than one annotator per resource and should not rely on the resource having some previous annotations. The system I did relies on the first tags that the user introduces. Nevertheless I'd say for the first iteration (and the Audio Commons deliverable we should work on something very simple (like a well-designed form in which some categories from the ontology can be chosen and then some fields are shown/hidden depending on these categories), and then we can propose to introduce more knowledge in future iterations.