fschmid56 / EfficientAT

This repository aims at providing efficient CNNs for Audio Tagging. We provide AudioSet pre-trained models ready for downstream training and extraction of audio embeddings.
MIT License
218 stars 41 forks source link

Pretrained model with 1-second frame width? #20

Closed turian closed 10 months ago

turian commented 10 months ago

I'm curious if you could release a pretrained model with a much shorter receptive length?

This would be useful for fine-grained tasks, like music transcription and event transcription (with a smaller hop size)

turian commented 10 months ago

Noticing that the convolution does this, and you can just remove the averaging