DSGT-DLP / Deep-Learning-Playground

Web Application where people new to Deep Learning can input a dataset and toy around with basic Pytorch modules without writing any code
MIT License
26 stars 8 forks source link

[FEATURE]: Integration of M5 Network Architecture for Audio Processing #1161

Open codingwithsurya opened 7 months ago

codingwithsurya commented 7 months ago

Feature Name

To enhance the Deep Learning Playground's audio data processing capabilities, we aim to integrate the M5 network architecture, inspired by the M5 network. This architecture is crucial for processing raw audio data efficiently, especially focusing on the receptive field of the first layer's filters.

Your Name

Surya Subramanian

Description

We are currently working on creating an audio trainspace in our deep learning playground. As part of this, we need to integrate a convolutional neural network to process raw audio data. The specific architecture we are looking to implement is modeled after the M5 network architecture, which is described in detail in this paper: https://arxiv.org/pdf/1610.00087.pdf.

Here is the Python code for the M5 network architecture:

(also available here: https://colab.research.google.com/github/pytorch/tutorials/blob/gh-pages/_downloads/audio_classifier_tutorial.ipynb#scrollTo=iXUe9kHdcV16)

import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv1d(1, 128, 80, 4)
        self.bn1 = nn.BatchNorm1d(128)
        self.pool1 = nn.MaxPool1d(4)
        self.conv2 = nn.Conv1d(128, 128, 3)
        self.bn2 = nn.BatchNorm1d(128)
        self.pool2 = nn.MaxPool1d(4)
        self.conv3 = nn.Conv1d(128, 256, 3)
        self.bn3 = nn.BatchNorm1d(256)
        self.pool3 = nn.MaxPool1d(4)
        self.conv4 = nn.Conv1d(256, 512, 3)
        self.bn4 = nn.BatchNorm1d(512)
        self.pool4 = nn.MaxPool1d(4)
        self.avgPool = nn.AvgPool1d(30) #input should be 512x30 so this outputs a 512x1
        self.fc1 = nn.Linear(512, 10)

    def forward(self, x):
        x = self.conv1(x)
        x = F.relu(self.bn1(x))
        x = self.pool1(x)
        x = self.conv2(x)
        x = F.relu(self.bn2(x))
        x = self.pool2(x)
        x = self.conv3(x)
        x = F.relu(self.bn3(x))
        x = self.pool3(x)
        x = self.conv4(x)
        x = F.relu(self.bn4(x))
        x = self.pool4(x)
        x = self.avgPool(x)
        x = x.permute(0, 2, 1) #change the 512x1 to 1x512
        x = self.fc1(x)
        return F.log_softmax(x, dim = 2)

The task is to integrate this model into our training directory. The model should be callable from the audio.py route. The files training/core/training.py and training/core/dmodel.py might be useful for this integration.

This is kind of open-ended so feel free to play around with this! Lmk if you have any questions.

So in summary:

Objectives

Implementation Details

github-actions[bot] commented 7 months ago

Hello @codingwithsurya! Thank you for submitting the Feature Request Form. We appreciate your contribution. :wave:

We will look into it and provide a response as soon as possible.

To work on this feature request, you can follow these branch setup instructions:

  1. Checkout the main branch:

     git checkout nextjs
  2. Pull the latest changes from the remote main branch:

     git pull origin nextjs
  3. Create a new branch specific to this feature request using the issue number:

     git checkout -b feature-1161

    Feel free to make the necessary changes in this branch and submit a pull request when you're ready.

    Best regards, Deep Learning Playground (DLP) Team

codingwithsurya commented 6 months ago

here's how i think we can incorporate the m5 network architecture. so, under training/core directory, we can make a file called m5_model.py. In the file, we can define the M5 network architecture as a class M5Net that extends nn.Module.

import torch.nn as nn
import torch.nn.functional as F

class M5Net(nn.Module):
    def __init__(self):
        super(M5Net, self).__init__()
        self.conv1 = nn.Conv1d(1, 128, 80, 4)
        self.bn1 = nn.BatchNorm1d(128)
        self.pool1 = nn.MaxPool1d(4)
        self.conv2 = nn.Conv1d(128, 128, 3)
        self.bn2 = nn.BatchNorm1d(128)
        self.pool2 = nn.MaxPool1d(4)
        self.conv3 = nn.Conv1d(128, 256, 3)
        self.bn3 = nn.BatchNorm1d(256)
        self.pool3 = nn.MaxPool1d(4)
        self.conv4 = nn.Conv1d(256, 512, 3)
        self.bn4 = nn.BatchNorm1d(512)
        self.pool4 = nn.MaxPool1d(4)
        self.avgPool = nn.AvgPool1d(30)
        self.fc1 = nn.Linear(512, 10)

    def forward(self, x):
        x = self.conv1(x)
        x = F.relu(self.bn1(x))
        x = self.pool1(x)
        x = self.conv2(x)
        x = F.relu(self.bn2(x))
        x = self.pool2(x)
        x = self.conv3(x)
        x = F.relu(self.bn3(x))
        x = self.pool3(x)
        x = self.conv4(x)
        x = F.relu(self.bn4(x))
        x = self.pool4(x)
        x = self.avgPool(x)
        x = x.permute(0, 2, 1)
        x = self.fc1(x)
        return F.log_softmax(x, dim = 2)

In the training/core/dl_model.py file, we can import the M5Net class: from .m5_model import M5Net

and, in the DLModel class in dl_model.py, we can add a new method to create an instance of M5Net:

def fromM5Net(self):
    return M5Net()

In the training/routes/audio/audio.py file, we can use the DLModel.fromM5Net() method to create an instance of the M5 network when needed.

we also need to update the training pipeline in training/core/trainer.py but this is how we can get started