This is a repository for my personal project, which is a simple web application that lets you generate music 🎵 through a chat interface 💬. As the music generative AI models are widely trained on human music, the generated music is very likely to have some similarity to the existing songs. Dealing with this ethical problem, this project also provides a music similarity search feature 🔍 to detect if the song has any similar music snippets.
As mentioned above, this app provides 2 main features:
The user interface of the app is powered by Streamlit and the backend API was implemented using FastAPI.
First, the music generation feature is mostly done by calling Suno API. As the site has not published the API, the API call must be provided with Cookie
and Session ID
of a logged-in suno account.
Second, the music similarity search feature is based on the Audio Fingerprint research topic which utilizes a Music Embedding Model and a Vector Database to store and query the extracted embeddings. To deploy this feature, this project used NVIDIA Triton Inference Server to deploy the model and Milvus Vector Database for the retrieval task.
The music embedding model is trained using Pytorch and based on Audio Fingerprint research field. For further training details, please visit the directory ./train/README.md
.
Music Generation Chat interface | Music Search interface |
---|---|
To set up the project, please follow these steps:
# Clone the repository
git clone https://github.com/Huy1711/AI-beat-maker.git
# Install the required dependencies
pip install -r requirements-dev.txt
Note: This project is built and tested on python 3.10
.
Visit this Google Drive or use the command below to download the model:
(For download model using Kaggle CLI, follow the Kaggle API installation and authentication instruction to setup)
# Download model on Kaggle and untar the model file
kaggle models instances versions download huy1711/model.pt/pyTorch/v1/1
tar -xvf model.pt.tar.gz
# Move to Triton model_repository folder
mv model.pt ./deploy/music_embedding/model_repository/neuralfp/1/
For a quick demo, I tried to use the pre-processed fma_medium
dataset available at Kaggle datasets. If you want to use the CLI, follow the Kaggle API installation and authentication instruction, and use the commands below to download the dataset:
(Note: You can skip this step if you already downloaded the dataset by following the preparation instruction in ./train
folder)
kaggle datasets download -d mimbres/neural-audio-fingerprint
mkdir ./datasets
unzip neural-audio-fingerprint.zip -d ./datasets/
After unzipping, the dataset should be available at ./datasets/neural-audio-fp-dataset
folder, then run the dataset preparation script
python ./scripts/make_id2path_dict.py
This project supports deployment using docker-compose
.
For the installation instruction, please visit https://docs.docker.com/compose/install/
If you want to change the Milvus database volume directory (default is ./volumes
), change the path in the .env.example
file and rename the file to .env
.
To start the application, run the following command:
docker-compose up -d
To add fma_medium
demo dataset to Milvus Vector DB, use the following command:
bash ./scripts/prepare_milvus.sh
The application will be available at http://localhost:8081
.
To get Suno Session ID
and Cookie
, follow these steps:
F12
or Right click + choose "Inspect"Network
tab and follow the below pictureNote: Due to the limitation in the number of song in the Vector Database, the positive music search cases are rare. If you want to test the positive cases, please use the song inside ./datasets/neural-audio-fp-dataset/music/test-query-db-500-30s/query
folder which include the perturbed version of the songs in the database. Otherwise, you will need to add more songs to the database (see ./scripts/milvusdb_manage/add_embedding_offline.py
).
Video demo: Google Drive link
If you find any issues or have suggestions for improvements, please feel free to open an issue or send me an email via nguyenduchuy1711@gmail.com.
This project was developed by @Huy1711 and uses the Suno API to generate and fetch music data.