Closed cvillela closed 11 months ago
Hi! Thank you for the comment and the interest.
The forward pass is implemented in a way, such that you always get the classifier output and the scene embeddings. However, it should be very easy to avoid the classification step and only return the features (an additional parameter passed to _forwardimpl and a single if condition is probably sufficient).
Best, Florian
Hello, thanks for the fast response.
Are there any differences between modifying the forward pass in this repo or using the EfficientAT_Hear implementation for gete_scene_embeddings()?
Also, on a separate issue, I am getting some [nan, .... nan] embeddings for some audio clips, even though setting precision to float16, resampling to 32kHz and folowing the exact same pipeline. Would you have a hunch on why it may be happening?
I am closing the original issue as it has been responded. Also, the [nan... ] embeddings occured because I was not normalizing my audio prior to embedding extraction.
Hello! Congratulations on the model, very impressive results. This is more of a question than it is an issue. I was wondering if there is a configured way to extract the scene embeddings w/o performing the classification step on this repo. Much appreciated, Caio