XinhaoMei / WavCaps

This reporsitory contains metadata of WavCaps dataset and codes for downstream tasks.
197 stars 11 forks source link

The different counting of datasets #13

Closed duduOliver closed 1 year ago

duduOliver commented 1 year ago
Thank you very much for your great contributions to the field of audio! I've downloaded the WavCaps dataset from HuggingFace and unzipped them. However, according to my counting of each data source, the counting numbers are slightly different from your claim. I attached my statistics as follows. Data Source # audio Claimed # audio in json_files # audio in ZIP
FreeSound 262300 262300(all) 214208
BBC Sound Effects 31291 31201 31201
SoundBible 1232 1232 1320
AudioSet SL subset 108317 108317 108317
Total 403140 403050 355046
WavCaps 403050

I found that the sequence of archives is discontinuous in Freesound and I don't know if it might be the reason that the index information in FreeSound.zip were out updated. Do you know how the differences were introduced to the datasets? and is there any easy way to make it up as the info in JSON files is not aligned with the audio sets and it would result in extra work to do data preprocessing?

Thank you again for preparing the very meaningful datasets!

XinhaoMei commented 1 year ago

Hi, thanks for your message.

For freesound, some archives were not uploaded. I am very sorry for this. We will upload them as soon as possible. For SoundBible, audios filtered out during post-processing were included in the zip file, you could directly ignore them.

duduOliver commented 1 year ago

Thanks for your message! Great! Now I can take it easy to wait for your updates on Freesound. So for BBC Sound Effects, the number of audio claimed is barely the number of audio before the post-processing, right? And you just removed some audio in post-processing. BTW, do you plan to release the post-processing scripts? I thought I would just keep the issue open until you update the datasets, in case someone might have the same query. Good luck!

XinhaoMei commented 1 year ago

Hi, for other data sources, please refer to the provided json files. The number of audio clips in BBC Sound Effects is 31201. 31291 is a typo. Sorry for this. The post-processing is the one we introduced in the paper. Some audios were filtered out during this process.

XinhaoMei commented 1 year ago

Hi, missing files have been uploaded to HuggingFace!

duduOliver commented 1 year ago

Thank you very much! I've verified, and now all the statistic numbers are matched.