Environmental Sound Classification Setup

JVass / background_sensitivity_of_CNNs

This repository will contain the assignment for the module of "Deep Learning for Audio and Music". The objective is to implement two Deep Neural Network architectures, for two different kind of tasks and try to combine those inferences. I chose to implement a U-Net for Spectrogram denoising and a DenseNet for Environmental Sound Classification.

0 stars 0 forks source link

Environmental Sound Classification Setup #2

Closed JVass closed 1 year ago

JVass commented 1 year ago

This will be the second section of the assignment that is: environmental sound classification with the use of a DenseNet

[x] Pretrained DenseNet import
[x] Pretrained DenseNet subclassing and super
[x] Spectrogram generation as images, based on Palanisamy et al (for reproducabiliy reasons) and with concatenating and not MultiRes generation, and STFT
[x] Freeze layers from DenseNet
[x] Attach an MLP for 10 classes of UrbanSound8k
[x] Tensorboard setup for evaluation
[x] Training loop setup
[x] Metrics used for the classification (Accuracy, F1 Score, Top K@, Confusion Matrix)
[x] Save the model

JVass commented 1 year ago

https://pytorch.org/hub/pytorch_vision_densenet/ says input image has to be:

Normalized with mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225]
Height and Width at least 224

Palanisamy et al didn't do any of those things, where HxW for UrbanSound8k is (128,250) and not normalized.

JVass commented 1 year ago

Global params for classification will be based on Palanisamy et al: No Early Stopping (I discarded the learning rate scheduler) EPOCHS = 70 LR = 1e-4

JVass commented 1 year ago

The results are not very promising, but it is as good it will get for the assignment.