The goal of the project is to prepare a machine learning module that can be hypothetically used in an automated, voice-based intercom device. Imagine that you are working in a team of several programmers and the access to your floor is restricted with doors. There is an intercom that can be used to open the door. You are implementing a machine learning module that will recognize if a given person has the permission to open the door or not. To simplify this problem, we interpret this as a binary recognition problem. Class 1 is made of allowed persons. Class 0 is made of not allowed persons. The medium used to distinguish between the two classes is voice. Thus, the project to be delivered in in fact a voice recognition project. To shift the focus to convolutional neural networks, we assume that the voice recording must be turned into spectrograms in the pre-processing stage. Thus, the project to be delivered is in fact a voice recognition project but implementing it will require techniques that can also be applied to diMerent kinds of image processing. ~ from Introduction to Machine Learning WUT course project description.
iml/
│
├── data/ # Raw .wav files directory
├── datasets/ # Processed dataset directory with train, val, test spectrograms
├── notebooks/ # Jupyter notebooks for data preparation and training
│ ├── data_preparation_and_analysis.ipynb # Data processing and analysis notebook
│ ├── model_training.ipynb # Model training and logging notebook
├── src/ # Main source files
│ ├── __init__.py
│ ├── data_processing.py # Spectrogram generation, augmentation, dataset split
│ ├── dataset_analysis.py # Dataset exploration and visualization functions
│ ├── model.py # CNN models definitions
│ └── training.py # Training and validation functions
├── models/ # Saved model directory
├── requirements.txt # Python dependencies
└── README.md # Project description and setup guide
requirements.txt
:
python3 -m venv env
source env/bin/activate
pip install -r requirements.txt
or VSCode can execute this command automagically by using "Python: Create Enviroment..." action.
To use jupyter notebooks (recommended way), jupyter
package must be installed.
.wav
files (I was using "clean" directory). Be aware of strange files starting with .
, which are not proper .wav
and needs to be removed.data_preparation_and_analysis.ipynb
notebook to:
datasets/
subdirectoriesmodel_training.ipynb
notebook to:
models/
Spectrogram generation is handled by the data_processing.py
functions:
load_audio()
: Loads audio from a .wav file.split_into_clips()
: Splits an audio file into 3-second clips.create_spectrogram()
: Converts audio clips into log-mel spectrograms.The SimpleCNN
is simple CNN for 128x128 image size.
The TutorialCNN
is the CNN from provided tutorial.