aisynth / diffmoog

MIT License
92 stars 3 forks source link

DiffMoog: A Modular Differentiable Commercial-like Synthesizer

Alt Text

This repo contains the implementation of DiffMoog, a differential, subtractive, modular synthesizer, incorporating standard architecture and sound modules commonly found in commercial synthesizers.

The repo contains code for the synthesizer modules and architecture, as well as a platform for training and evaluating sound matching models using the synthesizer as a basis.

Paper

https://arxiv.org/abs/2401.12570

Getting started

To get started, simply clone the repo and install the requirements in requirements.txt, then follow the instructions below.

Dataset generation

To create a dataset, run the file src/dataset/create_data.py with the following arguments:

Example:

-g 0 -s train --size 50000 -n reduced_simple_osc -c REDUCED_SIMPLE_OSC -sd 4.0 -no 3.0 -bs 1

The dataset will be saved in location root/data/{dataset_name}.

Model Training

Preliminary Conditions

Before you start with the model training, ensure the following preliminary conditions are met:

  1. Dataset Conditions

    • A dataset has been created in the directory root/data/{dataset_name} with four distinct folders: train, val, train_nsynth, and val_nsynth. The recommended dataset sizes are:

      • train: 50,000 samples
      • val: 5,000 samples
      • train_nsynth: 18,000 samples
      • val_nsynth: 2,000 samples
    • The train and val folders are populated automatically by the create_data.py script as described in the dataset generation section above. This script generates several files inside these folders including {sound_id}.wav, commit_and_args.txt, params_dataset.csv, and params_dataset.pkl.

    • The val_nsynth and train_nsynth folders need to be manually created by the user and should contain the NSynth dataset. Place the .wav files inside a subfolder named wav_files. You can use the notebook root/misc_notebooks/get_nsynth_dataset.ipynb to assist with this process.

  2. Configuration File

    • There is a configuration file available (probably at root/configs/{config_name}.yaml) that contains the model configuration settings. Refer to the example configuration file root/configs/example_config.yaml for an explanation of the different fields and options.

Directory Structure

The directory structure should resemble the hierarchy below:

root/
└── data/
    └── {dataset_name}/
        ├── train/                 # Generated by create_data.py script
        │   ├── wav_files/         
        │   │   └── {sound_id}.wav  # Generated files
        │   ├── commit_and_args.txt  # Generated files
        │   ├── params_dataset.csv   # Generated files
        │   └── params_dataset.pkl   # Generated files
        ├── val/                   # Generated by create_data.py script
        │   ├── wav_files/         
        │   │   └── {sound_id}.wav  # Generated files
        │   ├── commit_and_args.txt  # Generated files
        │   ├── params_dataset.csv   # Generated files
        │   └── params_dataset.pkl   # Generated files
        ├── val_nsynth/
        │   └── wav_files/
        │       └── {sound_id}.wav
        └── train_nsynth/
            └── wav_files/
                └── {sound_id}.wav

Training

To train the model, execute the src/main.py script with the following arguments:

Example:

-g 0 -e params_only -d modular_chain_dataset -c root/configs/paper_configs_reduced/params_only_config.yaml

Paper

The paper is available in the repository at root/paper.pdf. Supplementary material is at root/paper_supplementary.

Notebooks

In the examples directory, you will find several notebooks to help get you familiar with the DiffMoog synthesizer and its capabilities.

Start with explore_chains.ipynb and create_dataset.ipynb for introduction about the synth structure, configuration and sound generation.

Use train_model.ipynb and evaluate_model.ipynb to train and evaluate a sound matching model.