fschmid56 / EfficientAT

This repository aims at providing efficient CNNs for Audio Tagging. We provide AudioSet pre-trained models ready for downstream training and extraction of audio embeddings.
MIT License
218 stars 41 forks source link

Model input shape #16

Closed cxmscb closed 1 year ago

cxmscb commented 1 year ago

Thanks for the great work. I would like to ask the input(i.e.,x) shape of the mobilenet model, and is it (batch_size, 1, time_steps, mel_bins) or (batch_size, 1, mel_bins, time_steps)?

x = _mel_forward(x, mel)
y_hat, _ = model(x)
fschmid56 commented 1 year ago

Thank you for your appreciation.

The model's input is of shape: batch_size, 1, mel_bins, time_steps.

cxmscb commented 1 year ago

Thanks for the reply.