The different counting of datasets

duduOliver commented 1 year ago

Thank you very much for your great contributions to the field of audio! I've downloaded the WavCaps dataset from HuggingFace and unzipped them. However, according to my counting of each data source, the counting numbers are slightly different from your claim. I attached my statistics as follows.	Data Source	# audio Claimed	# audio in json_files
FreeSound	262300	262300(all)	214208
BBC Sound Effects	31291	31201	31201
SoundBible	1232	1232	1320
AudioSet SL subset	108317	108317	108317
Total	403140	403050	355046
WavCaps	403050

I found that the sequence of archives is discontinuous in Freesound and I don't know if it might be the reason that the index information in FreeSound.zip were out updated. Do you know how the differences were introduced to the datasets? and is there any easy way to make it up as the info in JSON files is not aligned with the audio sets and it would result in extra work to do data preprocessing?

Thank you again for preparing the very meaningful datasets!

XinhaoMei commented 1 year ago

Hi, thanks for your message.

For freesound, some archives were not uploaded. I am very sorry for this. We will upload them as soon as possible. For SoundBible, audios filtered out during post-processing were included in the zip file, you could directly ignore them.

duduOliver commented 1 year ago

Thanks for your message! Great! Now I can take it easy to wait for your updates on Freesound. So for BBC Sound Effects, the number of audio claimed is barely the number of audio before the post-processing, right? And you just removed some audio in post-processing. BTW, do you plan to release the post-processing scripts? I thought I would just keep the issue open until you update the datasets, in case someone might have the same query. Good luck!

XinhaoMei commented 1 year ago

Hi, for other data sources, please refer to the provided json files. The number of audio clips in BBC Sound Effects is 31201. 31291 is a typo. Sorry for this. The post-processing is the one we introduced in the paper. Some audios were filtered out during this process.

XinhaoMei commented 1 year ago

Hi, missing files have been uploaded to HuggingFace!

duduOliver commented 1 year ago

Thank you very much! I've verified, and now all the statistic numbers are matched.

XinhaoMei / WavCaps

The different counting of datasets #13