This repository aims at providing efficient CNNs for Audio Tagging. We provide AudioSet pre-trained models ready for downstream training and extraction of audio embeddings.
Does your question target tagging audio at a higher resolution (frame-level)? Then I would point you to a different issue in which this topic has been discussed: https://github.com/fschmid56/EfficientAT/issues/3
Hello, how to get the frame-level output results similar to clipwise_output in panns
Thanks