forresti / SqueezeNet

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters
BSD 2-Clause "Simplified" License
2.17k stars 723 forks source link

squeezenet for speech #64

Open akankshaaa13 opened 2 years ago

akankshaaa13 commented 2 years ago

can squeezenet be used for speech emotion recognition if we feed 3D log mel spectrum values?

dragon18456 commented 2 years ago

There exists a paradigm for speech emotion recognition where you can use a backbone like squeezenet for SER. Given some audio, you wish to classify the speech to some discrete number of classes like happy, sad, angry, etc. You can run squeezenet through the log mel spectrogram features (or MFCC if you want), discarding the classification layer. From here, you will have some activation tensor with length that depends on the length of the input, so you need to reduce it to a predefined size. Some works use RNNs or LSTMs, with some mixed results. If you are starting on SER, I think that something simple like global average pooling is a good place to start. From there, you can have a simple classification FC layer to get your logits.

akankshanarahari commented 2 years ago

What are the classification layers in squeezenet?

forresti commented 2 years ago

The final layer of SqueezeNet outputs a 1 dimensional vector with length equal to the number of categories. For example, if you are classifying an images and you have 1000 categories, each image will have a 1000-d vector. The model's predicted class is the element of the vector with the highest numerical value.

If you're classifying emotions of from audio data and you have 10 different emotions (e.g. happy, sad, confused, distracted, ...), then you would want to configure the model to have a 10-dimensional output vector.

One other note - this code repository is over 5 years old and uses a neural network framework called Caffe. Caffe is pretty old at this point, and I have since switched to using PyTorch. (It can be debated whether PyTorch or TensorFlow is better; I personally prefer PyTorch.) If you install PyTorch and Torchvision, there is an easy-to-use implementation of SqueezeNet there: https://pytorch.org/hub/pytorch_vision_squeezenet/