Pretrained model with 1-second frame width?

fschmid56 / EfficientAT

This repository aims at providing efficient CNNs for Audio Tagging. We provide AudioSet pre-trained models ready for downstream training and extraction of audio embeddings.

MIT License

233 stars 44 forks source link

Pretrained model with 1-second frame width? #20

Closed turian closed 1 year ago

turian commented 1 year ago

I'm curious if you could release a pretrained model with a much shorter receptive length?

This would be useful for fine-grained tasks, like music transcription and event transcription (with a smaller hop size)

turian commented 1 year ago

Noticing that the convolution does this, and you can just remove the averaging