Source of cough-speech-sneeze dataset

Yuening-Ma commented 5 months ago

hello, I'm training a model for cough detection and I would like to use cough-speech-sneeze dataset in audeering datasets.

I find the dataset description on this page: Dataset based on the publication of Shahin Amiriparian: “Amiriparian, S., Pugachevskiy, S., Cummins, N., Hantke, S., Pohjalainen, J., Keren, G., Schuller, B., 2017. CAST a database: Rapid targeted large-scale big data acquisition via small-world modelling of social media platforms, in: 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE, pp. 340–345. https://doi.org/10.1109/ACII.2017.8273622”

I have downloaded the dataset using audb code (many thanks for the data!), however, I would like to know: Is this the original dataset created by the authors of the paper above (Amiriparian et. al.)? Or the authors of audeering have organized and modified the data? I need this infomation so that I can elaborate the data source correctly in my paper. Thanks again!

PS: How should I cite your work if I download the data with audb?

hagenw commented 5 months ago

Good point, for version 1.0.0 of the dataset (~~which we did not publish with audb~~), we used the original data and labels, but in version 2.0.0 we added re-annotations for cough and sneeze and removed files, that were marked as bad audio/containing other sound classes, compare https://github.com/audeering/cough-speech-sneeze/blob/main/2.0.0/publish.py.

You don't have to add a citation for audb, but if you want, you could use https://arxiv.org/abs/2303.00645

hagenw commented 5 months ago

BTW, the raw labels of our re-annotation are available at https://github.com/audeering/cough-speech-sneeze/blob/main/2.0.0/annotations/20210412-102437-cough-sneeze/20210412-102437_cough-and-sneeze_annotations-cough_sneeze.csv.

hagenw commented 5 months ago

You can also load the dataset with the original labels:

>>> audb.versions("cough-speech-sneeze")
['1.0.0', '2.0.0', '2.0.1']

So, if you do:

>>> db = audb.load("cough-speech-sneeze", version="1.0.0")

You should get the original data and labels.

Yuening-Ma commented 5 months ago

Much thanks for your very timely reply! You really did a very solid job, the 2.0 version of the dataset I downloaded is quite clean! I will read the code and anno file for more detail.

audeering / datasets

Source of cough-speech-sneeze dataset #13