fschmid56 / EfficientAT

This repository aims at providing efficient CNNs for Audio Tagging. We provide AudioSet pre-trained models ready for downstream training and extraction of audio embeddings.
MIT License
218 stars 41 forks source link

Question on use case #31

Open HeChengHui opened 3 months ago

HeChengHui commented 3 months ago

@fschmid56 Thank you for your work! I am looking for ways to detect certain sounds among background noise. would like to check if this method can be used to finetune a model to detect a small number of sounds excluding background noise?

fschmid56 commented 2 weeks ago

Hi @HeChengHui, sorry I was busy lately.

To understand you correctly, you have sound files containing some events + background noise. You would like to ignore the background noise and detect the events, correct?

Do you have a dataset labeled with the events you would like to detect? In this case, you could just fine-tune one of the models of this repo on your dataset (as done e.g. in ex_fsd50k.py).

You could also try to run the AudioSet pre-trained models without fine-tuning. Maybe the events you would like to detect are already present in the AudioSet ontology and the distribution shift between your audio clips and clips from AudioSet is not too severe.

Or is your goal to filter out the background noise?