fschmid56 / EfficientAT

This repository aims at providing efficient CNNs for Audio Tagging. We provide AudioSet pre-trained models ready for downstream training and extraction of audio embeddings.
MIT License
218 stars 41 forks source link

How to accurately identify the sound event offset? #5

Closed joewale closed 1 year ago

joewale commented 1 year ago

Hi,fschmid,I have a question: How to accurately identify the sound event offset? As is known, the sound events in our life are endless, and some events are not the sound event categories in the Audioset. How to accurately identify these?

fschmid56 commented 1 year ago

Hi Joewale,

How to accurately identify the sound event offset?

I think what you are looking for is what we discussed here. There will be more work from my side in this direction in the near future.

As is known, the sound events in our life are endless, and some events are not the sound event categories in the Audioset. How to accurately identify these?

Do you have a small dataset for the new sound events? Then you have two options: use the models in this repo as embeddings extractors and train a shallow classifier on top of it (e.g. a simple one-layer MLP, see here for examples how to get embeddings), or fine-tune the models in this repo such as done in this file.

joewale commented 1 year ago

Thank you for you reply! Sorry, maybe my description is unclear. My question is how to avoid the audio of the 10s duration , which is not the sound event type in our label, being error recognized as the sound event label in our trainset.

fschmid56 commented 1 year ago

I'm still not sure what you mean. Can you elaborate on what you mean by sound event type in our label and avoid the audio of 10s duration? For the former, would you like to tag shorter audio clips?