This repository aims at providing efficient CNNs for Audio Tagging. We provide AudioSet pre-trained models ready for downstream training and extraction of audio embeddings.
Thanks for the great work. I would like to ask the input(i.e.,x) shape of the mobilenet model, and is it (batch_size, 1, time_steps, mel_bins) or (batch_size, 1, mel_bins, time_steps)?
Thanks for the great work. I would like to ask the input(i.e.,x) shape of the mobilenet model, and is it (batch_size, 1, time_steps, mel_bins) or (batch_size, 1, mel_bins, time_steps)?