MTG / freesound-datasets

A platform for the collaborative creation of open audio collections labeled by humans and based on Freesound content.
https://annotator.freesound.org/
GNU Affero General Public License v3.0
135 stars 11 forks source link

Present and Predominant is ambiguous #148

Closed xavierfav closed 6 years ago

xavierfav commented 6 years ago

In our validation task, we currently we ask people to validate the presence of a sound category in audio samples according to one of this four answer:

The definition given to Present and predominant is different to the common definition of predominant (~ being superior). However, the definition we give correspond to the fact that the sound event is the only thing present in the audio clip rather than the most important or superior one.

Moreover, a lot of people from our lab did not get it the right way. Maybe because they did not read carefully the definition we give, and maybe because they had a pretty clear idea of what predominant means in general.

I propose to change it with:

Limitation:

It could maybe be understood that several instances of the same sound event would not be considered Present (solo), but I don't think so. Maybe we can clarify the definition:

The type of sound described is clearly present and there are no other types of sound, with the exception of low/mild background noise. The same sound can be repeated several time.

ffont commented 6 years ago

The current definitions in the help page are as follows:

In the instructions page this is slightly different (changes in Present and predominant):

To me, the two different definitions given for present and predominant are perfectly OK and bear the same ambiguity (which is "what should be considered background noise?"). There is always going to be some ambiguity and for me this is perfectly fine. Nevertheless we should use one single definition and not have that duplicated.

What would however be a significant change is changing the labels as @xavierfav proposes (i.e. Present and predominant -> Present (solo)). His proposed labels seem to introduce a bias towards more isolated sounds in the first category. The real question here is: do we want this bias?

In my opinion we don't need/want this bias. Ambiguity will always be there, and the cost of changing labels (and the meaning of the category) now is quite big. In any analysis we do of the data we will always need to make the distinction with the PP before the change and the "PP" after the change. Also, I think that a "present and predominant" scenario is much more common in "real world" than a "solo" scenario, we risk many more sounds going to Present (mixed) at the expense of sounds in Present (solo) being a bit more isolated. I don't really see this as a big advantage.

If the real problem here is that we don't know if a sound is isolated or not, I think we could work on algorithms to estimate that. That would be very useful for many use case like building a clean dataset or searching in Freesound. Also I think is a very interesting research problem and there is probably some work already done in speech. Ideas would be to estimate the background noisiness, and also to compare frequency distributions of different events happening in the sound to estimate the likeliness of different sources being present. Of course these are just some ideas, but I think that that doing something in that direction would be much more interesting and useful than changing the meaning of our current voting system to get more isolated sounds in the first category (and still have no certainty of the impact of that change). If we were to re-design the platform from scratch, we could argue whether we want this bias towards isolated sounds or not, but I don't think changing it now will have many benefits.

xavierfav commented 6 years ago

From our (poor) experience using the votes generated with the validation tasks, it seems that knowing that there is only one source in an audio clip is (very) useful. It allows us to create dataset of "high quality", in the sense that we know we are not missing any acoustic source in an audio sample.

But I understand that the validation task is not the way to get it. And changing it now is not a good idea. It seems that we would need other tasks/tools to achieve such a quality in our datasets.

The generation task is one of them #134, which allows to add labels that our automatic method did not propose. We are still thinking on what to propose, but, one reasonable idea is to create two new simpler tasks based on focusing on a single sound resource at a time: