fschmid56 / EfficientAT

This repository aims at providing efficient CNNs for Audio Tagging. We provide AudioSet pre-trained models ready for downstream training and extraction of audio embeddings.
MIT License
218 stars 41 forks source link

Obtain audio embeddings #19

Closed cvillela closed 11 months ago

cvillela commented 11 months ago

Hello! Congratulations on the model, very impressive results. This is more of a question than it is an issue. I was wondering if there is a configured way to extract the scene embeddings w/o performing the classification step on this repo. Much appreciated, Caio

fschmid56 commented 11 months ago

Hi! Thank you for the comment and the interest.

The forward pass is implemented in a way, such that you always get the classifier output and the scene embeddings. However, it should be very easy to avoid the classification step and only return the features (an additional parameter passed to _forwardimpl and a single if condition is probably sufficient).

Best, Florian

cvillela commented 11 months ago

Hello, thanks for the fast response.

Are there any differences between modifying the forward pass in this repo or using the EfficientAT_Hear implementation for gete_scene_embeddings()?

cvillela commented 11 months ago

Also, on a separate issue, I am getting some [nan, .... nan] embeddings for some audio clips, even though setting precision to float16, resampling to 32kHz and folowing the exact same pipeline. Would you have a hunch on why it may be happening?

cvillela commented 11 months ago

I am closing the original issue as it has been responded. Also, the [nan... ] embeddings occured because I was not normalizing my audio prior to embedding extraction.