1aienthusiast / audiocraft-infinity-webui

GNU Affero General Public License v3.0
153 stars 20 forks source link
agplv3 artificial-intelligence audiocraft generation machine-learning ml music music-generation musicgen open-source python web-ui webui

Audiocraft Infinity WebUI

Adds generation of songs with a length of over 30 seconds.

Adds the ability to continue songs.

Adds a seed option.

Adds ability to load locally downloaded models.

Adds training (Thanks to chavinlo's repo https://github.com/chavinlo/musicgen_trainer)

Adds MacOS support.

Adds queue (on the main-queue branch: https://github.com/1aienthusiast/audiocraft-infinity-webui/tree/main-queue)

Batching (run webuibatch.py instead of webui.py)

Disables (hopefully) the gradio analytics.

Note! Project is currently not actively maintained but accepts PRs.

Installation

Python 3.9 is recommended.

  1. Clone the repo: git clone https://github.com/1aienthusiast/audiocraft-infinity-webui.git
  2. Install pytorch: pip install 'torch>=2.0'
  3. Install the requirements: pip install -r requirements.txt
  4. Clone my fork of the Meta audiocraft repo and chavinlo's MusicGen trainer inside the repositories folder:
    cd repositories
    git clone https://github.com/1aienthusiast/audiocraft
    git clone https://github.com/chavinlo/musicgen_trainer
    cd ..

    Note!

    If you already cloned the Meta audiocraft repo you have to remove it then clone the provided fork for the seed option to work.

    cd repositories
    rm -rf audiocraft/
    git clone https://github.com/1aienthusiast/audiocraft
    git clone https://github.com/chavinlo/musicgen_trainer
    cd ..

Usage

python webui.py python webuibatch.py - with batching support

Updating

Run git pull inside the root folder to update the webui, and the same command inside repositories/audiocraft to update audiocraft.

Models

Meta provides 4 pre-trained models. The pre trained models are:

Needs a GPU!

I recommend 12GB of VRAM for the large model.

Training

Dataset Creation

Create a folder, in it, place your audio and caption files. They must be WAV and TXT format respectively.

Place the folder in training/datasets/.

Important: Split your audios in 35 second chunks. Only the first 30 seconds will be processed. Audio cannot be less than 30 seconds.

In this example, segment_000.txt contains the caption "jazz music, jobim" for wav file segment_000.wav

Options

Models

Once training finishes, the model (and checkpoints) will be available under the models/ directory.

Loading the finetuned models

Model gets saved to models/ as lm_final.pt

1) Place it in models/DIRECTORY_NAME/ 2) In the Inference tab choose custom as the model and enter DIRECTORY_NAME into the input field. 3) In the Inference tab choose the model it was finetuned on

Colab

For google colab you need to use the --share flag.

License