Closed Katehuuh closed 3 months ago
I have successfully fine-tuned stabilityai/stable-audio-open-1.0 using 32GB VRAM
, 12 *.wav
each 44100Hz of 30s:
python train.py --dataset-config dataset_config.json --model-config stable-audio-open-1.0\model_config.json --name wonder_pop_dataset --pretrained-ckpt-path stable-audio-open-1.0\model.safetensors --batch-size 1 --checkpoint-every 1 --save-dir saved_ckpt
@Katehuuh would you mind explaining how/where you actually incorporate the CLAP embeddings? Do you run CLAP inference each time with your custom_metadata.py
or perhaps elsewhere?
@Katehuuh would you mind explaining how/where you actually incorporate the CLAP embeddings? Do you run CLAP inference each time with your
custom_metadata.py
or perhaps elsewhere?
I’ve just fine-tuned stable-audio-open-1.0\model_config.json
, rather than pretraining from scratch with required CLAP set https://github.com/Stability-AI/stable-audio-tools/blob/b51af8b60a0e619780e6be5cd35bd2525073ec52/stable_audio_tools/configs/model_configs/txt2audio/stable_audio_1_0.json#L45
In Stability-AI/stable-audio-tools include model_config.json similar to stable_audio_1_0.json however requires path not include
clap.ckpt
possibly music_audioset_epoch_15_esc_90.14.pt of LAION-AI/CLAP #44 #50 : https://github.com/Stability-AI/stable-audio-tools/blob/b51af8b60a0e619780e6be5cd35bd2525073ec52/stable_audio_tools/configs/model_configs/txt2audio/stable_audio_1_0.json#L45Install of b51af8b
OS `Windows, Python 3.10.8, CUDA 11.8` ```cmd python -m venv venv call venv\Scripts\activate git clone https://github.com/Stability-AI/stable-audio-tools.git cd stable-audio-tools git clone https://huggingface.co/stabilityai/stable-audio-open-1.0 cd stable-audio-open-1.0 :: dataset.tar https://drive.google.com/file/d/16J1CVu7EZPD_22FxitZ0TpOd__FwzOmx tar -xvf stable-audio-open-1.0/dataset.tar -C .\stable-audio-open-1.0 cd .. pip install stable-audio-tools pip install . pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 ```
Instead of replicate training of Stability-AI/stable-audio-tools, I will fine-tuning
Log the error:
Missing key(s) in state_dict, Clik to expend full log
```cmd Found 791 files C:\stable-audio-tools\venv\lib\site-packages\torch\nn\utils\weight_norm.py:28: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm. warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.") Traceback (most recent call last): File "C:\stable-audio-tools\train.py", line 128, in
Edit1: I had Issue with
defaults.ini
and confusion of --ckpt-path
,--pretrained-ckpt-path
and--pretransform-ckpt-path
. #87Having same file .txt and .wav, would be much simpler than config dataset..wav
+.txt
can be handle likecustom_metadata.py
```python import os def get_custom_metadata(info, audio): filename = os.path.basename(info["path"]) # Get the filename from the info dictionary base_name, _ = os.path.splitext(filename) # Extract the base name without extension txt_file_path = os.path.join(os.path.dirname(info["path"]), f"{base_name}.txt") # Construct the path to the corresponding text file if os.path.exists(txt_file_path): with open(txt_file_path, "r", encoding="utf-8") as f: prompt = f.read().strip() else: prompt = "No prompt available" return {"prompt": prompt} ```
kindly asking for real relative path instead of the instruction:https://github.com/Stability-AI/stable-audio-tools/blob/b51af8b60a0e619780e6be5cd35bd2525073ec52/README.md?plain=1#L63In order to replicate Fine-tuning, with as simply 2 example sound files included and config.How to caption from sound to text?lyramakesmusic/clap-interrogator andkey
,bpm
using librosa.