david-rx / BioLingual

Contrastive language-audio pretraining for bioacoustics
Apache License 2.0
16 stars 0 forks source link

Release of AnimalSpeak at HuggingFace #2

Open julian-carpenter opened 9 months ago

julian-carpenter commented 9 months ago

Congratulations on this very interesting and nice paper.

I was just looking through your released AnimalSpeak CSV file at HuggingFace. The file contains only 894284 entries, but in the paper, you mention 1.1M audio-caption pairs. Could you maybe clarify where the discrepancy comes from?

Will you release a script that downloads all the snippets and processes everything so that it can be used to recreate the results from your paper? That would be a great help. Thank you very much.

david-rx commented 8 months ago

Hi Julian, thanks for the message. The main differences between the released set and the full set described in the paper:

  1. The released csv doesn't contain AudioCaps. It's easier to get elsewhere e.g. with a library like audiocaps-download
  2. Several held-out sets are already removed from the released version. This includes an AnimalSpeak test set, a small eval set, as well as the eval and test sets from Watkins and CBI in the BEANS benchmark. I'll try to add at least the test set used for large-scale species prediction soon. Also, some Xeno-canto recordings were deduplicated before the release and training if they had extremely frequent captions.

After adding AudioCaps and processing, this should be the right set to approximately recreate the training. I don't have a good script to cleanly download and process these ready yet, but this is something I hope to share soon!