Stability-AI / stable-audio-tools

Generative models for conditional audio generation
MIT License
2.55k stars 237 forks source link

Training - Missing key(s) in state_dict: #99

Closed Katehuuh closed 3 months ago

Katehuuh commented 3 months ago

In Stability-AI/stable-audio-tools include model_config.json similar to stable_audio_1_0.json however requires path not include clap.ckpt possibly music_audioset_epoch_15_esc_90.14.pt of LAION-AI/CLAP #44 #50 : https://github.com/Stability-AI/stable-audio-tools/blob/b51af8b60a0e619780e6be5cd35bd2525073ec52/stable_audio_tools/configs/model_configs/txt2audio/stable_audio_1_0.json#L45

Install of b51af8b
OS `Windows, Python 3.10.8, CUDA 11.8` ```cmd python -m venv venv call venv\Scripts\activate git clone https://github.com/Stability-AI/stable-audio-tools.git cd stable-audio-tools git clone https://huggingface.co/stabilityai/stable-audio-open-1.0 cd stable-audio-open-1.0 :: dataset.tar https://drive.google.com/file/d/16J1CVu7EZPD_22FxitZ0TpOd__FwzOmx tar -xvf stable-audio-open-1.0/dataset.tar -C .\stable-audio-open-1.0 cd .. pip install stable-audio-tools pip install . pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 ```

Instead of replicate training of Stability-AI/stable-audio-tools, I will fine-tuning

python train.py --dataset-config stable-audio-open-1.0\dataset\metadata\dataset_root.json --model-config .\stable-audio-open-1.0\model_config.json --name my_audio_dataset --pretransform-ckpt-path .\stable-audio-open-1.0\model.safetensors

Log the error:

Missing key(s) in state_dict, Clik to expend full log
```cmd Found 791 files C:\stable-audio-tools\venv\lib\site-packages\torch\nn\utils\weight_norm.py:28: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm. warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.") Traceback (most recent call last): File "C:\stable-audio-tools\train.py", line 128, in main() File "C:\stable-audio-tools\train.py", line 63, in main model.pretransform.load_state_dict(load_ckpt_state_dict(args.pretransform_ckpt_path)) File "C:\stable-audio-tools\stable_audio_tools\models\pretransforms.py", line 90, in load_state_dict self.model.load_state_dict(state_dict, strict=strict) File "C:\stable-audio-tools\venv\lib\site-packages\torch\nn\modules\module.py", line 2189, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for AudioAutoencoder: Missing key(s) in state_dict: "encoder.layers.0.bias", "encoder.layers.0.weight_g", "encoder.layers.0.weight_v", "encoder.layers.1.layers.0.layers.0.alpha", "encoder.layers.1.layers.0.layers.0.beta", "encoder.layers.1.layers.0.layers.1.bias", "encoder.layers.1.layers.0.layers.1.weight_g", "encoder.layers.1.layers.0.layers.1.weight_v", "encoder.layers.1.layers.0.layers.2.alpha", "encoder.layers.1.layers.0.layers.2.beta", "encoder.layers.1.layers.0.layers.3.bias", "encoder.layers.1.layers.0.layers.3.weight_g", "encoder.layers.1.layers.0.layers.3.weight_v", "encoder.layers.1.layers.1.layers.0.alpha", "encoder.layers.1.layers.1.layers.0.beta", "encoder.layers.1.layers.1.layers.1.bias", "encoder.layers.1.layers.1.layers.1.weight_g", "encoder.layers.1.layers.1.layers.1.weight_v", "encoder.layers.1.layers.1.layers.2.alpha", "encoder.layers.1.layers.1.layers.2.beta", "encoder.layers.1.layers.1.layers.3.bias", "encoder.layers.1.layers.1.layers.3.weight_g", "encoder.layers.1.layers.1.layers.3.weight_v", "encoder.layers.1.layers.2.layers.0.alpha", "encoder.layers.1.layers.2.layers.0.beta", "encoder.layers.1.layers.2.layers.1.bias", "encoder.layers.1.layers.2.layers.1.weight_g", "encoder.layers.1.layers.2.layers.1.weight_v", "encoder.layers.1.layers.2.layers.2.alpha", "encoder.layers.1.layers.2.layers.2.beta", "encoder.layers.1.layers.2.layers.3.bias", "encoder.layers.1.layers.2.layers.3.weight_g", "encoder.layers.1.layers.2.layers.3.weight_v", "encoder.layers.1.layers.3.alpha", "encoder.layers.1.layers.3.beta", "encoder.layers.1.layers.4.bias", "encoder.layers.1.layers.4.weight_g", "encoder.layers.1.layers.4.weight_v", "encoder.layers.2.layers.0.layers.0.alpha", "encoder.layers.2.layers.0.layers.0.beta", "encoder.layers.2.layers.0.layers.1.bias", "encoder.layers.2.layers.0.layers.1.weight_g", "encoder.layers.2.layers.0.layers.1.weight_v", "encoder.layers.2.layers.0.layers.2.alpha", "encoder.layers.2.layers.0.layers.2.beta", "encoder.layers.2.layers.0.layers.3.bias", "encoder.layers.2.layers.0.layers.3.weight_g", "encoder.layers.2.layers.0.layers.3.weight_v", "encoder.layers.2.layers.1.layers.0.alpha", "encoder.layers.2.layers.1.layers.0.beta", "encoder.layers.2.layers.1.layers.1.bias", "encoder.layers.2.layers.1.layers.1.weight_g", "encoder.layers.2.layers.1.layers.1.weight_v", "encoder.layers.2.layers.1.layers.2.alpha", "encoder.layers.2.layers.1.layers.2.beta", "encoder.layers.2.layers.1.layers.3.bias", "encoder.layers.2.layers.1.layers.3.weight_g", "encoder.layers.2.layers.1.layers.3.weight_v", "encoder.layers.2.layers.2.layers.0.alpha", "encoder.layers.2.layers.2.layers.0.beta", "encoder.layers.2.layers.2.layers.1.bias", "encoder.layers.2.layers.2.layers.1.weight_g", "encoder.layers.2.layers.2.layers.1.weight_v", "encoder.layers.2.layers.2.layers.2.alpha", "encoder.layers.2.layers.2.layers.2.beta", "encoder.layers.2.layers.2.layers.3.bias", "encoder.layers.2.layers.2.layers.3.weight_g", "encoder.layers.2.layers.2.layers.3.weight_v", "encoder.layers.2.layers.3.alpha", "encoder.layers.2.layers.3.beta", "encoder.layers.2.layers.4.bias", "encoder.layers.2.layers.4.weight_g", "encoder.layers.2.layers.4.weight_v", "encoder.layers.3.layers.0.layers.0.alpha", "encoder.layers.3.layers.0.layers.0.beta", "encoder.layers.3.layers.0.layers.1.bias", "encoder.layers.3.layers.0.layers.1.weight_g", "encoder.layers.3.layers.0.layers.1.weight_v", "encoder.layers.3.layers.0.layers.2.alpha", "encoder.layers.3.layers.0.layers.2.beta", "encoder.layers.3.layers.0.layers.3.bias", "encoder.layers.3.layers.0.layers.3.weight_g", "encoder.layers.3.layers.0.layers.3.weight_v", "encoder.layers.3.layers.1.layers.0.alpha", "encoder.layers.3.layers.1.layers.0.beta", "encoder.layers.3.layers.1.layers.1.bias", "encoder.layers.3.layers.1.layers.1.weight_g", "encoder.layers.3.layers.1.layers.1.weight_v", "encoder.layers.3.layers.1.layers.2.alpha", "encoder.layers.3.layers.1.layers.2.beta", "encoder.layers.3.layers.1.layers.3.bias", "encoder.layers.3.layers.1.layers.3.weight_g", "encoder.layers.3.layers.1.layers.3.weight_v", "encoder.layers.3.layers.2.layers.0.alpha", "encoder.layers.3.layers.2.layers.0.beta", "encoder.layers.3.layers.2.layers.1.bias", "encoder.layers.3.layers.2.layers.1.weight_g", "encoder.layers.3.layers.2.layers.1.weight_v", "encoder.layers.3.layers.2.layers.2.alpha", "encoder.layers.3.layers.2.layers.2.beta", "encoder.layers.3.layers.2.layers.3.bias", "encoder.layers.3.layers.2.layers.3.weight_g", "encoder.layers.3.layers.2.layers.3.weight_v", "encoder.layers.3.layers.3.alpha", "encoder.layers.3.layers.3.beta", "encoder.layers.3.layers.4.bias", "encoder.layers.3.layers.4.weight_g", "encoder.layers.3.layers.4.weight_v", "encoder.layers.4.layers.0.layers.0.alpha", "encoder.layers.4.layers.0.layers.0.beta", "encoder.layers.4.layers.0.layers.1.bias", "encoder.layers.4.layers.0.layers.1.weight_g", "encoder.layers.4.layers.0.layers.1.weight_v", "encoder.layers.4.layers.0.layers.2.alpha", "encoder.layers.4.layers.0.layers.2.beta", "encoder.layers.4.layers.0.layers.3.bias", "encoder.layers.4.layers.0.layers.3.weight_g", "encoder.layers.4.layers.0.layers.3.weight_v", "encoder.layers.4.layers.1.layers.0.alpha", "encoder.layers.4.layers.1.layers.0.beta", "encoder.layers.4.layers.1.layers.1.bias", "encoder.layers.4.layers.1.layers.1.weight_g", "encoder.layers.4.layers.1.layers.1.weight_v", "encoder.layers.4.layers.1.layers.2.alpha", "encoder.layers.4.layers.1.layers.2.beta", "encoder.layers.4.layers.1.layers.3.bias", "encoder.layers.4.layers.1.layers.3.weight_g", "encoder.layers.4.layers.1.layers.3.weight_v", "encoder.layers.4.layers.2.layers.0.alpha", "encoder.layers.4.layers.2.layers.0.beta", "encoder.layers.4.layers.2.layers.1.bias", "encoder.layers.4.layers.2.layers.1.weight_g", "encoder.layers.4.layers.2.layers.1.weight_v", "encoder.layers.4.layers.2.layers.2.alpha", "encoder.layers.4.layers.2.layers.2.beta", "encoder.layers.4.layers.2.layers.3.bias", "encoder.layers.4.layers.2.layers.3.weight_g", "encoder.layers.4.layers.2.layers.3.weight_v", "encoder.layers.4.layers.3.alpha", "encoder.layers.4.layers.3.beta", "encoder.layers.4.layers.4.bias", "encoder.layers.4.layers.4.weight_g", "encoder.layers.4.layers.4.weight_v", "encoder.layers.5.layers.0.layers.0.alpha", "encoder.layers.5.layers.0.layers.0.beta", "encoder.layers.5.layers.0.layers.1.bias", "encoder.layers.5.layers.0.layers.1.weight_g", "encoder.layers.5.layers.0.layers.1.weight_v", "encoder.layers.5.layers.0.layers.2.alpha", "encoder.layers.5.layers.0.layers.2.beta", "encoder.layers.5.layers.0.layers.3.bias", "encoder.layers.5.layers.0.layers.3.weight_g", "encoder.layers.5.layers.0.layers.3.weight_v", "encoder.layers.5.layers.1.layers.0.alpha", "encoder.layers.5.layers.1.layers.0.beta", "encoder.layers.5.layers.1.layers.1.bias", "encoder.layers.5.layers.1.layers.1.weight_g", "encoder.layers.5.layers.1.layers.1.weight_v", "encoder.layers.5.layers.1.layers.2.alpha", "encoder.layers.5.layers.1.layers.2.beta", "encoder.layers.5.layers.1.layers.3.bias", "encoder.layers.5.layers.1.layers.3.weight_g", "encoder.layers.5.layers.1.layers.3.weight_v", "encoder.layers.5.layers.2.layers.0.alpha", "encoder.layers.5.layers.2.layers.0.beta", "encoder.layers.5.layers.2.layers.1.bias", "encoder.layers.5.layers.2.layers.1.weight_g", "encoder.layers.5.layers.2.layers.1.weight_v", "encoder.layers.5.layers.2.layers.2.alpha", "encoder.layers.5.layers.2.layers.2.beta", "encoder.layers.5.layers.2.layers.3.bias", "encoder.layers.5.layers.2.layers.3.weight_g", "encoder.layers.5.layers.2.layers.3.weight_v", "encoder.layers.5.layers.3.alpha", "encoder.layers.5.layers.3.beta", "encoder.layers.5.layers.4.bias", "encoder.layers.5.layers.4.weight_g", "encoder.layers.5.layers.4.weight_v", "encoder.layers.6.alpha", "encoder.layers.6.beta", "encoder.layers.7.bias", "encoder.layers.7.weight_g", "encoder.layers.7.weight_v", "decoder.layers.0.bias", "decoder.layers.0.weight_g", "decoder.layers.0.weight_v", "decoder.layers.1.layers.0.alpha", "decoder.layers.1.layers.0.beta", "decoder.layers.1.layers.1.bias", "decoder.layers.1.layers.1.weight_g", "decoder.layers.1.layers.1.weight_v", "decoder.layers.1.layers.2.layers.0.alpha", "decoder.layers.1.layers.2.layers.0.beta", "decoder.layers.1.layers.2.layers.1.bias", "decoder.layers.1.layers.2.layers.1.weight_g", "decoder.layers.1.layers.2.layers.1.weight_v", "decoder.layers.1.layers.2.layers.2.alpha", "decoder.layers.1.layers.2.layers.2.beta", "decoder.layers.1.layers.2.layers.3.bias", "decoder.layers.1.layers.2.layers.3.weight_g", "decoder.layers.1.layers.2.layers.3.weight_v", "decoder.layers.1.layers.3.layers.0.alpha", "decoder.layers.1.layers.3.layers.0.beta", "decoder.layers.1.layers.3.layers.1.bias", "decoder.layers.1.layers.3.layers.1.weight_g", "decoder.layers.1.layers.3.layers.1.weight_v", "decoder.layers.1.layers.3.layers.2.alpha", "decoder.layers.1.layers.3.layers.2.beta", "decoder.layers.1.layers.3.layers.3.bias", "decoder.layers.1.layers.3.layers.3.weight_g", "decoder.layers.1.layers.3.layers.3.weight_v", "decoder.layers.1.layers.4.layers.0.alpha", "decoder.layers.1.layers.4.layers.0.beta", "decoder.layers.1.layers.4.layers.1.bias", "decoder.layers.1.layers.4.layers.1.weight_g", "decoder.layers.1.layers.4.layers.1.weight_v", "decoder.layers.1.layers.4.layers.2.alpha", "decoder.layers.1.layers.4.layers.2.beta", "decoder.layers.1.layers.4.layers.3.bias", "decoder.layers.1.layers.4.layers.3.weight_g", "decoder.layers.1.layers.4.layers.3.weight_v", "decoder.layers.2.layers.0.alpha", "decoder.layers.2.layers.0.beta", "decoder.layers.2.layers.1.bias", "decoder.layers.2.layers.1.weight_g", "decoder.layers.2.layers.1.weight_v", "decoder.layers.2.layers.2.layers.0.alpha", "decoder.layers.2.layers.2.layers.0.beta", "decoder.layers.2.layers.2.layers.1.bias", "decoder.layers.2.layers.2.layers.1.weight_g", "decoder.layers.2.layers.2.layers.1.weight_v", "decoder.layers.2.layers.2.layers.2.alpha", "decoder.layers.2.layers.2.layers.2.beta", "decoder.layers.2.layers.2.layers.3.bias", "decoder.layers.2.layers.2.layers.3.weight_g", "decoder.layers.2.layers.2.layers.3.weight_v", "decoder.layers.2.layers.3.layers.0.alpha", "decoder.layers.2.layers.3.layers.0.beta", "decoder.layers.2.layers.3.layers.1.bias", "decoder.layers.2.layers.3.layers.1.weight_g", "decoder.layers.2.layers.3.layers.1.weight_v", "decoder.layers.2.layers.3.layers.2.alpha", "decoder.layers.2.layers.3.layers.2.beta", "decoder.layers.2.layers.3.layers.3.bias", "decoder.layers.2.layers.3.layers.3.weight_g", "decoder.layers.2.layers.3.layers.3.weight_v", "decoder.layers.2.layers.4.layers.0.alpha", "decoder.layers.2.layers.4.layers.0.beta", "decoder.layers.2.layers.4.layers.1.bias", "decoder.layers.2.layers.4.layers.1.weight_g", "decoder.layers.2.layers.4.layers.1.weight_v", "decoder.layers.2.layers.4.layers.2.alpha", "decoder.layers.2.layers.4.layers.2.beta", "decoder.layers.2.layers.4.layers.3.bias", "decoder.layers.2.layers.4.layers.3.weight_g", "decoder.layers.2.layers.4.layers.3.weight_v", "decoder.layers.3.layers.0.alpha", "decoder.layers.3.layers.0.beta", "decoder.layers.3.layers.1.bias", "decoder.layers.3.layers.1.weight_g", "decoder.layers.3.layers.1.weight_v", "decoder.layers.3.layers.2.layers.0.alpha", "decoder.layers.3.layers.2.layers.0.beta", "decoder.layers.3.layers.2.layers.1.bias", "decoder.layers.3.layers.2.layers.1.weight_g", "decoder.layers.3.layers.2.layers.1.weight_v", "decoder.layers.3.layers.2.layers.2.alpha", "decoder.layers.3.layers.2.layers.2.beta", "decoder.layers.3.layers.2.layers.3.bias", "decoder.layers.3.layers.2.layers.3.weight_g", "decoder.layers.3.layers.2.layers.3.weight_v", "decoder.layers.3.layers.3.layers.0.alpha", "decoder.layers.3.layers.3.layers.0.beta", "decoder.layers.3.layers.3.layers.1.bias", "decoder.layers.3.layers.3.layers.1.weight_g", "decoder.layers.3.layers.3.layers.1.weight_v", "decoder.layers.3.layers.3.layers.2.alpha", "decoder.layers.3.layers.3.layers.2.beta", "decoder.layers.3.layers.3.layers.3.bias", "decoder.layers.3.layers.3.layers.3.weight_g", "decoder.layers.3.layers.3.layers.3.weight_v", "decoder.layers.3.layers.4.layers.0.alpha", "decoder.layers.3.layers.4.layers.0.beta", "decoder.layers.3.layers.4.layers.1.bias", "decoder.layers.3.layers.4.layers.1.weight_g", "decoder.layers.3.layers.4.layers.1.weight_v", "decoder.layers.3.layers.4.layers.2.alpha", "decoder.layers.3.layers.4.layers.2.beta", "decoder.layers.3.layers.4.layers.3.bias", "decoder.layers.3.layers.4.layers.3.weight_g", "decoder.layers.3.layers.4.layers.3.weight_v", "decoder.layers.4.layers.0.alpha", "decoder.layers.4.layers.0.beta", "decoder.layers.4.layers.1.bias", "decoder.layers.4.layers.1.weight_g", "decoder.layers.4.layers.1.weight_v", "decoder.layers.4.layers.2.layers.0.alpha", "decoder.layers.4.layers.2.layers.0.beta", "decoder.layers.4.layers.2.layers.1.bias", "decoder.layers.4.layers.2.layers.1.weight_g", "decoder.layers.4.layers.2.layers.1.weight_v", "decoder.layers.4.layers.2.layers.2.alpha", "decoder.layers.4.layers.2.layers.2.beta", "decoder.layers.4.layers.2.layers.3.bias", "decoder.layers.4.layers.2.layers.3.weight_g", "decoder.layers.4.layers.2.layers.3.weight_v", "decoder.layers.4.layers.3.layers.0.alpha", "decoder.layers.4.layers.3.layers.0.beta", "decoder.layers.4.layers.3.layers.1.bias", "decoder.layers.4.layers.3.layers.1.weight_g", "decoder.layers.4.layers.3.layers.1.weight_v", "decoder.layers.4.layers.3.layers.2.alpha", "decoder.layers.4.layers.3.layers.2.beta", "decoder.layers.4.layers.3.layers.3.bias", "decoder.layers.4.layers.3.layers.3.weight_g", "decoder.layers.4.layers.3.layers.3.weight_v", "decoder.layers.4.layers.4.layers.0.alpha", "decoder.layers.4.layers.4.layers.0.beta", "decoder.layers.4.layers.4.layers.1.bias", "decoder.layers.4.layers.4.layers.1.weight_g", "decoder.layers.4.layers.4.layers.1.weight_v", "decoder.layers.4.layers.4.layers.2.alpha", "decoder.layers.4.layers.4.layers.2.beta", "decoder.layers.4.layers.4.layers.3.bias", "decoder.layers.4.layers.4.layers.3.weight_g", "decoder.layers.4.layers.4.layers.3.weight_v", "decoder.layers.5.layers.0.alpha", "decoder.layers.5.layers.0.beta", "decoder.layers.5.layers.1.bias", "decoder.layers.5.layers.1.weight_g", "decoder.layers.5.layers.1.weight_v", "decoder.layers.5.layers.2.layers.0.alpha", "decoder.layers.5.layers.2.layers.0.beta", "decoder.layers.5.layers.2.layers.1.bias", "decoder.layers.5.layers.2.layers.1.weight_g", "decoder.layers.5.layers.2.layers.1.weight_v", "decoder.layers.5.layers.2.layers.2.alpha", "decoder.layers.5.layers.2.layers.2.beta", "decoder.layers.5.layers.2.layers.3.bias", "decoder.layers.5.layers.2.layers.3.weight_g", "decoder.layers.5.layers.2.layers.3.weight_v", "decoder.layers.5.layers.3.layers.0.alpha", "decoder.layers.5.layers.3.layers.0.beta", "decoder.layers.5.layers.3.layers.1.bias", "decoder.layers.5.layers.3.layers.1.weight_g", "decoder.layers.5.layers.3.layers.1.weight_v", "decoder.layers.5.layers.3.layers.2.alpha", "decoder.layers.5.layers.3.layers.2.beta", "decoder.layers.5.layers.3.layers.3.bias", "decoder.layers.5.layers.3.layers.3.weight_g", "decoder.layers.5.layers.3.layers.3.weight_v", "decoder.layers.5.layers.4.layers.0.alpha", "decoder.layers.5.layers.4.layers.0.beta", "decoder.layers.5.layers.4.layers.1.bias", "decoder.layers.5.layers.4.layers.1.weight_g", "decoder.layers.5.layers.4.layers.1.weight_v", "decoder.layers.5.layers.4.layers.2.alpha", "decoder.layers.5.layers.4.layers.2.beta", "decoder.layers.5.layers.4.layers.3.bias", "decoder.layers.5.layers.4.layers.3.weight_g", "decoder.layers.5.layers.4.layers.3.weight_v", "decoder.layers.6.alpha", "decoder.layers.6.beta", "decoder.layers.7.weight_g", "decoder.layers.7.weight_v". Unexpected key(s) in state_dict: "conditioner.conditioners.seconds_start.embedder.embedding.0.weights", "conditioner.conditioners.seconds_start.embedder.embedding.1.bias", "conditioner.conditioners.seconds_start.embedder.embedding.1.weight", "conditioner.conditioners.seconds_total.embedder.embedding.0.weights", "conditioner.conditioners.seconds_total.embedder.embedding.1.bias", "conditioner.conditioners.seconds_total.embedder.embedding.1.weight", "model.model.postprocess_conv.weight", "model.model.preprocess_conv.weight", "model.model.timestep_features.weight", "model.model.to_cond_embed.0.weight", "model.model.to_cond_embed.2.weight", "model.model.to_global_embed.0.weight", "model.model.to_global_embed.2.weight", "model.model.to_timestep_embed.0.bias", "model.model.to_timestep_embed.0.weight", "model.model.to_timestep_embed.2.bias", "model.model.to_timestep_embed.2.weight", "model.model.transformer.layers.0.cross_attend_norm.beta", "model.model.transformer.layers.0.cross_attend_norm.gamma", "model.model.transformer.layers.0.cross_attn.to_kv.weight", "model.model.transformer.layers.0.cross_attn.to_out.weight", "model.model.transformer.layers.0.cross_attn.to_q.weight", "model.model.transformer.layers.0.ff.ff.0.proj.bias", "model.model.transformer.layers.0.ff.ff.0.proj.weight", "model.model.transformer.layers.0.ff.ff.2.bias", "model.model.transformer.layers.0.ff.ff.2.weight", "model.model.transformer.layers.0.ff_norm.beta", "model.model.transformer.layers.0.ff_norm.gamma", "model.model.transformer.layers.0.pre_norm.beta", "model.model.transformer.layers.0.pre_norm.gamma", "model.model.transformer.layers.0.self_attn.to_out.weight", "model.model.transformer.layers.0.self_attn.to_qkv.weight", "model.model.transformer.layers.1.cross_attend_norm.beta", "model.model.transformer.layers.1.cross_attend_norm.gamma", "model.model.transformer.layers.1.cross_attn.to_kv.weight", "model.model.transformer.layers.1.cross_attn.to_out.weight", "model.model.transformer.layers.1.cross_attn.to_q.weight", "model.model.transformer.layers.1.ff.ff.0.proj.bias", "model.model.transformer.layers.1.ff.ff.0.proj.weight", "model.model.transformer.layers.1.ff.ff.2.bias", "model.model.transformer.layers.1.ff.ff.2.weight", "model.model.transformer.layers.1.ff_norm.beta", "model.model.transformer.layers.1.ff_norm.gamma", "model.model.transformer.layers.1.pre_norm.beta", "model.model.transformer.layers.1.pre_norm.gamma", "model.model.transformer.layers.1.self_attn.to_out.weight", "model.model.transformer.layers.1.self_attn.to_qkv.weight", "model.model.transformer.layers.10.cross_attend_norm.beta", "model.model.transformer.layers.10.cross_attend_norm.gamma", "model.model.transformer.layers.10.cross_attn.to_kv.weight", "model.model.transformer.layers.10.cross_attn.to_out.weight", "model.model.transformer.layers.10.cross_attn.to_q.weight", "model.model.transformer.layers.10.ff.ff.0.proj.bias", "model.model.transformer.layers.10.ff.ff.0.proj.weight", "model.model.transformer.layers.10.ff.ff.2.bias", "model.model.transformer.layers.10.ff.ff.2.weight", "model.model.transformer.layers.10.ff_norm.beta", "model.model.transformer.layers.10.ff_norm.gamma", "model.model.transformer.layers.10.pre_norm.beta", "model.model.transformer.layers.10.pre_norm.gamma", "model.model.transformer.layers.10.self_attn.to_out.weight", "model.model.transformer.layers.10.self_attn.to_qkv.weight", "model.model.transformer.layers.11.cross_attend_norm.beta", "model.model.transformer.layers.11.cross_attend_norm.gamma", "model.model.transformer.layers.11.cross_attn.to_kv.weight", "model.model.transformer.layers.11.cross_attn.to_out.weight", "model.model.transformer.layers.11.cross_attn.to_q.weight", "model.model.transformer.layers.11.ff.ff.0.proj.bias", "model.model.transformer.layers.11.ff.ff.0.proj.weight", "model.model.transformer.layers.11.ff.ff.2.bias", "model.model.transformer.layers.11.ff.ff.2.weight", "model.model.transformer.layers.11.ff_norm.beta", "model.model.transformer.layers.11.ff_norm.gamma", "model.model.transformer.layers.11.pre_norm.beta", "model.model.transformer.layers.11.pre_norm.gamma", "model.model.transformer.layers.11.self_attn.to_out.weight", "model.model.transformer.layers.11.self_attn.to_qkv.weight", "model.model.transformer.layers.12.cross_attend_norm.beta", "model.model.transformer.layers.12.cross_attend_norm.gamma", "model.model.transformer.layers.12.cross_attn.to_kv.weight", "model.model.transformer.layers.12.cross_attn.to_out.weight", "model.model.transformer.layers.12.cross_attn.to_q.weight", "model.model.transformer.layers.12.ff.ff.0.proj.bias", "model.model.transformer.layers.12.ff.ff.0.proj.weight", "model.model.transformer.layers.12.ff.ff.2.bias", "model.model.transformer.layers.12.ff.ff.2.weight", "model.model.transformer.layers.12.ff_norm.beta", "model.model.transformer.layers.12.ff_norm.gamma", "model.model.transformer.layers.12.pre_norm.beta", "model.model.transformer.layers.12.pre_norm.gamma", "model.model.transformer.layers.12.self_attn.to_out.weight", "model.model.transformer.layers.12.self_attn.to_qkv.weight", "model.model.transformer.layers.13.cross_attend_norm.beta", "model.model.transformer.layers.13.cross_attend_norm.gamma", "model.model.transformer.layers.13.cross_attn.to_kv.weight", "model.model.transformer.layers.13.cross_attn.to_out.weight", "model.model.transformer.layers.13.cross_attn.to_q.weight", "model.model.transformer.layers.13.ff.ff.0.proj.bias", "model.model.transformer.layers.13.ff.ff.0.proj.weight", "model.model.transformer.layers.13.ff.ff.2.bias", "model.model.transformer.layers.13.ff.ff.2.weight", "model.model.transformer.layers.13.ff_norm.beta", "model.model.transformer.layers.13.ff_norm.gamma", "model.model.transformer.layers.13.pre_norm.beta", "model.model.transformer.layers.13.pre_norm.gamma", "model.model.transformer.layers.13.self_attn.to_out.weight", "model.model.transformer.layers.13.self_attn.to_qkv.weight", "model.model.transformer.layers.14.cross_attend_norm.beta", "model.model.transformer.layers.14.cross_attend_norm.gamma", "model.model.transformer.layers.14.cross_attn.to_kv.weight", "model.model.transformer.layers.14.cross_attn.to_out.weight", "model.model.transformer.layers.14.cross_attn.to_q.weight", "model.model.transformer.layers.14.ff.ff.0.proj.bias", "model.model.transformer.layers.14.ff.ff.0.proj.weight", "model.model.transformer.layers.14.ff.ff.2.bias", "model.model.transformer.layers.14.ff.ff.2.weight", "model.model.transformer.layers.14.ff_norm.beta", "model.model.transformer.layers.14.ff_norm.gamma", "model.model.transformer.layers.14.pre_norm.beta", "model.model.transformer.layers.14.pre_norm.gamma", "model.model.transformer.layers.14.self_attn.to_out.weight", "model.model.transformer.layers.14.self_attn.to_qkv.weight", "model.model.transformer.layers.15.cross_attend_norm.beta", "model.model.transformer.layers.15.cross_attend_norm.gamma", "model.model.transformer.layers.15.cross_attn.to_kv.weight", "model.model.transformer.layers.15.cross_attn.to_out.weight", "model.model.transformer.layers.15.cross_attn.to_q.weight", "model.model.transformer.layers.15.ff.ff.0.proj.bias", "model.model.transformer.layers.15.ff.ff.0.proj.weight", "model.model.transformer.layers.15.ff.ff.2.bias", "model.model.transformer.layers.15.ff.ff.2.weight", "model.model.transformer.layers.15.ff_norm.beta", "model.model.transformer.layers.15.ff_norm.gamma", "model.model.transformer.layers.15.pre_norm.beta", "model.model.transformer.layers.15.pre_norm.gamma", "model.model.transformer.layers.15.self_attn.to_out.weight", "model.model.transformer.layers.15.self_attn.to_qkv.weight", "model.model.transformer.layers.16.cross_attend_norm.beta", "model.model.transformer.layers.16.cross_attend_norm.gamma", "model.model.transformer.layers.16.cross_attn.to_kv.weight", "model.model.transformer.layers.16.cross_attn.to_out.weight", "model.model.transformer.layers.16.cross_attn.to_q.weight", "model.model.transformer.layers.16.ff.ff.0.proj.bias", "model.model.transformer.layers.16.ff.ff.0.proj.weight", "model.model.transformer.layers.16.ff.ff.2.bias", "model.model.transformer.layers.16.ff.ff.2.weight", "model.model.transformer.layers.16.ff_norm.beta", "model.model.transformer.layers.16.ff_norm.gamma", "model.model.transformer.layers.16.pre_norm.beta", "model.model.transformer.layers.16.pre_norm.gamma", "model.model.transformer.layers.16.self_attn.to_out.weight", "model.model.transformer.layers.16.self_attn.to_qkv.weight", "model.model.transformer.layers.17.cross_attend_norm.beta", "model.model.transformer.layers.17.cross_attend_norm.gamma", "model.model.transformer.layers.17.cross_attn.to_kv.weight", "model.model.transformer.layers.17.cross_attn.to_out.weight", "model.model.transformer.layers.17.cross_attn.to_q.weight", "model.model.transformer.layers.17.ff.ff.0.proj.bias", "model.model.transformer.layers.17.ff.ff.0.proj.weight", "model.model.transformer.layers.17.ff.ff.2.bias", "model.model.transformer.layers.17.ff.ff.2.weight", "model.model.transformer.layers.17.ff_norm.beta", "model.model.transformer.layers.17.ff_norm.gamma", "model.model.transformer.layers.17.pre_norm.beta", "model.model.transformer.layers.17.pre_norm.gamma", "model.model.transformer.layers.17.self_attn.to_out.weight", "model.model.transformer.layers.17.self_attn.to_qkv.weight", "model.model.transformer.layers.18.cross_attend_norm.beta", "model.model.transformer.layers.18.cross_attend_norm.gamma", "model.model.transformer.layers.18.cross_attn.to_kv.weight", "model.model.transformer.layers.18.cross_attn.to_out.weight", "model.model.transformer.layers.18.cross_attn.to_q.weight", "model.model.transformer.layers.18.ff.ff.0.proj.bias", "model.model.transformer.layers.18.ff.ff.0.proj.weight", "model.model.transformer.layers.18.ff.ff.2.bias", "model.model.transformer.layers.18.ff.ff.2.weight", "model.model.transformer.layers.18.ff_norm.beta", "model.model.transformer.layers.18.ff_norm.gamma", "model.model.transformer.layers.18.pre_norm.beta", "model.model.transformer.layers.18.pre_norm.gamma", "model.model.transformer.layers.18.self_attn.to_out.weight", "model.model.transformer.layers.18.self_attn.to_qkv.weight", "model.model.transformer.layers.19.cross_attend_norm.beta", "model.model.transformer.layers.19.cross_attend_norm.gamma", "model.model.transformer.layers.19.cross_attn.to_kv.weight", "model.model.transformer.layers.19.cross_attn.to_out.weight", "model.model.transformer.layers.19.cross_attn.to_q.weight", "model.model.transformer.layers.19.ff.ff.0.proj.bias", "model.model.transformer.layers.19.ff.ff.0.proj.weight", "model.model.transformer.layers.19.ff.ff.2.bias", "model.model.transformer.layers.19.ff.ff.2.weight", "model.model.transformer.layers.19.ff_norm.beta", "model.model.transformer.layers.19.ff_norm.gamma", "model.model.transformer.layers.19.pre_norm.beta", "model.model.transformer.layers.19.pre_norm.gamma", "model.model.transformer.layers.19.self_attn.to_out.weight", "model.model.transformer.layers.19.self_attn.to_qkv.weight", "model.model.transformer.layers.2.cross_attend_norm.beta", "model.model.transformer.layers.2.cross_attend_norm.gamma", "model.model.transformer.layers.2.cross_attn.to_kv.weight", "model.model.transformer.layers.2.cross_attn.to_out.weight", "model.model.transformer.layers.2.cross_attn.to_q.weight", "model.model.transformer.layers.2.ff.ff.0.proj.bias", "model.model.transformer.layers.2.ff.ff.0.proj.weight", "model.model.transformer.layers.2.ff.ff.2.bias", "model.model.transformer.layers.2.ff.ff.2.weight", "model.model.transformer.layers.2.ff_norm.beta", "model.model.transformer.layers.2.ff_norm.gamma", "model.model.transformer.layers.2.pre_norm.beta", "model.model.transformer.layers.2.pre_norm.gamma", "model.model.transformer.layers.2.self_attn.to_out.weight", "model.model.transformer.layers.2.self_attn.to_qkv.weight", "model.model.transformer.layers.20.cross_attend_norm.beta", "model.model.transformer.layers.20.cross_attend_norm.gamma", "model.model.transformer.layers.20.cross_attn.to_kv.weight", "model.model.transformer.layers.20.cross_attn.to_out.weight", "model.model.transformer.layers.20.cross_attn.to_q.weight", "model.model.transformer.layers.20.ff.ff.0.proj.bias", "model.model.transformer.layers.20.ff.ff.0.proj.weight", "model.model.transformer.layers.20.ff.ff.2.bias", "model.model.transformer.layers.20.ff.ff.2.weight", "model.model.transformer.layers.20.ff_norm.beta", "model.model.transformer.layers.20.ff_norm.gamma", "model.model.transformer.layers.20.pre_norm.beta", "model.model.transformer.layers.20.pre_norm.gamma", "model.model.transformer.layers.20.self_attn.to_out.weight", "model.model.transformer.layers.20.self_attn.to_qkv.weight", "model.model.transformer.layers.21.cross_attend_norm.beta", "model.model.transformer.layers.21.cross_attend_norm.gamma", "model.model.transformer.layers.21.cross_attn.to_kv.weight", "model.model.transformer.layers.21.cross_attn.to_out.weight", "model.model.transformer.layers.21.cross_attn.to_q.weight", "model.model.transformer.layers.21.ff.ff.0.proj.bias", "model.model.transformer.layers.21.ff.ff.0.proj.weight", "model.model.transformer.layers.21.ff.ff.2.bias", "model.model.transformer.layers.21.ff.ff.2.weight", "model.model.transformer.layers.21.ff_norm.beta", "model.model.transformer.layers.21.ff_norm.gamma", "model.model.transformer.layers.21.pre_norm.beta", "model.model.transformer.layers.21.pre_norm.gamma", "model.model.transformer.layers.21.self_attn.to_out.weight", "model.model.transformer.layers.21.self_attn.to_qkv.weight", "model.model.transformer.layers.22.cross_attend_norm.beta", "model.model.transformer.layers.22.cross_attend_norm.gamma", "model.model.transformer.layers.22.cross_attn.to_kv.weight", "model.model.transformer.layers.22.cross_attn.to_out.weight", "model.model.transformer.layers.22.cross_attn.to_q.weight", "model.model.transformer.layers.22.ff.ff.0.proj.bias", "model.model.transformer.layers.22.ff.ff.0.proj.weight", "model.model.transformer.layers.22.ff.ff.2.bias", "model.model.transformer.layers.22.ff.ff.2.weight", "model.model.transformer.layers.22.ff_norm.beta", "model.model.transformer.layers.22.ff_norm.gamma", "model.model.transformer.layers.22.pre_norm.beta", "model.model.transformer.layers.22.pre_norm.gamma", "model.model.transformer.layers.22.self_attn.to_out.weight", "model.model.transformer.layers.22.self_attn.to_qkv.weight", "model.model.transformer.layers.23.cross_attend_norm.beta", "model.model.transformer.layers.23.cross_attend_norm.gamma", "model.model.transformer.layers.23.cross_attn.to_kv.weight", "model.model.transformer.layers.23.cross_attn.to_out.weight", "model.model.transformer.layers.23.cross_attn.to_q.weight", "model.model.transformer.layers.23.ff.ff.0.proj.bias", "model.model.transformer.layers.23.ff.ff.0.proj.weight", "model.model.transformer.layers.23.ff.ff.2.bias", "model.model.transformer.layers.23.ff.ff.2.weight", "model.model.transformer.layers.23.ff_norm.beta", "model.model.transformer.layers.23.ff_norm.gamma", "model.model.transformer.layers.23.pre_norm.beta", "model.model.transformer.layers.23.pre_norm.gamma", "model.model.transformer.layers.23.self_attn.to_out.weight", "model.model.transformer.layers.23.self_attn.to_qkv.weight", "model.model.transformer.layers.3.cross_attend_norm.beta", "model.model.transformer.layers.3.cross_attend_norm.gamma", "model.model.transformer.layers.3.cross_attn.to_kv.weight", "model.model.transformer.layers.3.cross_attn.to_out.weight", "model.model.transformer.layers.3.cross_attn.to_q.weight", "model.model.transformer.layers.3.ff.ff.0.proj.bias", "model.model.transformer.layers.3.ff.ff.0.proj.weight", "model.model.transformer.layers.3.ff.ff.2.bias", "model.model.transformer.layers.3.ff.ff.2.weight", "model.model.transformer.layers.3.ff_norm.beta", "model.model.transformer.layers.3.ff_norm.gamma", "model.model.transformer.layers.3.pre_norm.beta", "model.model.transformer.layers.3.pre_norm.gamma", "model.model.transformer.layers.3.self_attn.to_out.weight", "model.model.transformer.layers.3.self_attn.to_qkv.weight", "model.model.transformer.layers.4.cross_attend_norm.beta", "model.model.transformer.layers.4.cross_attend_norm.gamma", "model.model.transformer.layers.4.cross_attn.to_kv.weight", "model.model.transformer.layers.4.cross_attn.to_out.weight", "model.model.transformer.layers.4.cross_attn.to_q.weight", "model.model.transformer.layers.4.ff.ff.0.proj.bias", "model.model.transformer.layers.4.ff.ff.0.proj.weight", "model.model.transformer.layers.4.ff.ff.2.bias", "model.model.transformer.layers.4.ff.ff.2.weight", "model.model.transformer.layers.4.ff_norm.beta", "model.model.transformer.layers.4.ff_norm.gamma", "model.model.transformer.layers.4.pre_norm.beta", "model.model.transformer.layers.4.pre_norm.gamma", "model.model.transformer.layers.4.self_attn.to_out.weight", "model.model.transformer.layers.4.self_attn.to_qkv.weight", "model.model.transformer.layers.5.cross_attend_norm.beta", "model.model.transformer.layers.5.cross_attend_norm.gamma", "model.model.transformer.layers.5.cross_attn.to_kv.weight", "model.model.transformer.layers.5.cross_attn.to_out.weight", "model.model.transformer.layers.5.cross_attn.to_q.weight", "model.model.transformer.layers.5.ff.ff.0.proj.bias", "model.model.transformer.layers.5.ff.ff.0.proj.weight", "model.model.transformer.layers.5.ff.ff.2.bias", "model.model.transformer.layers.5.ff.ff.2.weight", "model.model.transformer.layers.5.ff_norm.beta", "model.model.transformer.layers.5.ff_norm.gamma", "model.model.transformer.layers.5.pre_norm.beta", "model.model.transformer.layers.5.pre_norm.gamma", "model.model.transformer.layers.5.self_attn.to_out.weight", "model.model.transformer.layers.5.self_attn.to_qkv.weight", "model.model.transformer.layers.6.cross_attend_norm.beta", "model.model.transformer.layers.6.cross_attend_norm.gamma", "model.model.transformer.layers.6.cross_attn.to_kv.weight", "model.model.transformer.layers.6.cross_attn.to_out.weight", "model.model.transformer.layers.6.cross_attn.to_q.weight", "model.model.transformer.layers.6.ff.ff.0.proj.bias", "model.model.transformer.layers.6.ff.ff.0.proj.weight", "model.model.transformer.layers.6.ff.ff.2.bias", "model.model.transformer.layers.6.ff.ff.2.weight", "model.model.transformer.layers.6.ff_norm.beta", "model.model.transformer.layers.6.ff_norm.gamma", "model.model.transformer.layers.6.pre_norm.beta", "model.model.transformer.layers.6.pre_norm.gamma", "model.model.transformer.layers.6.self_attn.to_out.weight", "model.model.transformer.layers.6.self_attn.to_qkv.weight", "model.model.transformer.layers.7.cross_attend_norm.beta", "model.model.transformer.layers.7.cross_attend_norm.gamma", "model.model.transformer.layers.7.cross_attn.to_kv.weight", "model.model.transformer.layers.7.cross_attn.to_out.weight", "model.model.transformer.layers.7.cross_attn.to_q.weight", "model.model.transformer.layers.7.ff.ff.0.proj.bias", "model.model.transformer.layers.7.ff.ff.0.proj.weight", "model.model.transformer.layers.7.ff.ff.2.bias", "model.model.transformer.layers.7.ff.ff.2.weight", "model.model.transformer.layers.7.ff_norm.beta", "model.model.transformer.layers.7.ff_norm.gamma", "model.model.transformer.layers.7.pre_norm.beta", "model.model.transformer.layers.7.pre_norm.gamma", "model.model.transformer.layers.7.self_attn.to_out.weight", "model.model.transformer.layers.7.self_attn.to_qkv.weight", "model.model.transformer.layers.8.cross_attend_norm.beta", "model.model.transformer.layers.8.cross_attend_norm.gamma", "model.model.transformer.layers.8.cross_attn.to_kv.weight", "model.model.transformer.layers.8.cross_attn.to_out.weight", "model.model.transformer.layers.8.cross_attn.to_q.weight", "model.model.transformer.layers.8.ff.ff.0.proj.bias", "model.model.transformer.layers.8.ff.ff.0.proj.weight", "model.model.transformer.layers.8.ff.ff.2.bias", "model.model.transformer.layers.8.ff.ff.2.weight", "model.model.transformer.layers.8.ff_norm.beta", "model.model.transformer.layers.8.ff_norm.gamma", "model.model.transformer.layers.8.pre_norm.beta", "model.model.transformer.layers.8.pre_norm.gamma", "model.model.transformer.layers.8.self_attn.to_out.weight", "model.model.transformer.layers.8.self_attn.to_qkv.weight", "model.model.transformer.layers.9.cross_attend_norm.beta", "model.model.transformer.layers.9.cross_attend_norm.gamma", "model.model.transformer.layers.9.cross_attn.to_kv.weight", "model.model.transformer.layers.9.cross_attn.to_out.weight", "model.model.transformer.layers.9.cross_attn.to_q.weight", "model.model.transformer.layers.9.ff.ff.0.proj.bias", "model.model.transformer.layers.9.ff.ff.0.proj.weight", "model.model.transformer.layers.9.ff.ff.2.bias", "model.model.transformer.layers.9.ff.ff.2.weight", "model.model.transformer.layers.9.ff_norm.beta", "model.model.transformer.layers.9.ff_norm.gamma", "model.model.transformer.layers.9.pre_norm.beta", "model.model.transformer.layers.9.pre_norm.gamma", "model.model.transformer.layers.9.self_attn.to_out.weight", "model.model.transformer.layers.9.self_attn.to_qkv.weight", "model.model.transformer.project_in.weight", "model.model.transformer.project_out.weight", "model.model.transformer.rotary_pos_emb.inv_freq", "pretransform.model.decoder.layers.0.bias", "pretransform.model.decoder.layers.0.weight_g", "pretransform.model.decoder.layers.0.weight_v", "pretransform.model.decoder.layers.1.layers.0.alpha", "pretransform.model.decoder.layers.1.layers.0.beta", "pretransform.model.decoder.layers.1.layers.1.bias", "pretransform.model.decoder.layers.1.layers.1.weight_g", "pretransform.model.decoder.layers.1.layers.1.weight_v", "pretransform.model.decoder.layers.1.layers.2.layers.0.alpha", "pretransform.model.decoder.layers.1.layers.2.layers.0.beta", "pretransform.model.decoder.layers.1.layers.2.layers.1.bias", "pretransform.model.decoder.layers.1.layers.2.layers.1.weight_g", "pretransform.model.decoder.layers.1.layers.2.layers.1.weight_v", "pretransform.model.decoder.layers.1.layers.2.layers.2.alpha", "pretransform.model.decoder.layers.1.layers.2.layers.2.beta", "pretransform.model.decoder.layers.1.layers.2.layers.3.bias", "pretransform.model.decoder.layers.1.layers.2.layers.3.weight_g", "pretransform.model.decoder.layers.1.layers.2.layers.3.weight_v", "pretransform.model.decoder.layers.1.layers.3.layers.0.alpha", "pretransform.model.decoder.layers.1.layers.3.layers.0.beta", "pretransform.model.decoder.layers.1.layers.3.layers.1.bias", "pretransform.model.decoder.layers.1.layers.3.layers.1.weight_g", "pretransform.model.decoder.layers.1.layers.3.layers.1.weight_v", "pretransform.model.decoder.layers.1.layers.3.layers.2.alpha", "pretransform.model.decoder.layers.1.layers.3.layers.2.beta", "pretransform.model.decoder.layers.1.layers.3.layers.3.bias", "pretransform.model.decoder.layers.1.layers.3.layers.3.weight_g", "pretransform.model.decoder.layers.1.layers.3.layers.3.weight_v", "pretransform.model.decoder.layers.1.layers.4.layers.0.alpha", "pretransform.model.decoder.layers.1.layers.4.layers.0.beta", "pretransform.model.decoder.layers.1.layers.4.layers.1.bias", "pretransform.model.decoder.layers.1.layers.4.layers.1.weight_g", "pretransform.model.decoder.layers.1.layers.4.layers.1.weight_v", "pretransform.model.decoder.layers.1.layers.4.layers.2.alpha", "pretransform.model.decoder.layers.1.layers.4.layers.2.beta", "pretransform.model.decoder.layers.1.layers.4.layers.3.bias", "pretransform.model.decoder.layers.1.layers.4.layers.3.weight_g", "pretransform.model.decoder.layers.1.layers.4.layers.3.weight_v", "pretransform.model.decoder.layers.2.layers.0.alpha", "pretransform.model.decoder.layers.2.layers.0.beta", "pretransform.model.decoder.layers.2.layers.1.bias", "pretransform.model.decoder.layers.2.layers.1.weight_g", "pretransform.model.decoder.layers.2.layers.1.weight_v", "pretransform.model.decoder.layers.2.layers.2.layers.0.alpha", "pretransform.model.decoder.layers.2.layers.2.layers.0.beta", "pretransform.model.decoder.layers.2.layers.2.layers.1.bias", "pretransform.model.decoder.layers.2.layers.2.layers.1.weight_g", "pretransform.model.decoder.layers.2.layers.2.layers.1.weight_v", "pretransform.model.decoder.layers.2.layers.2.layers.2.alpha", "pretransform.model.decoder.layers.2.layers.2.layers.2.beta", "pretransform.model.decoder.layers.2.layers.2.layers.3.bias", "pretransform.model.decoder.layers.2.layers.2.layers.3.weight_g", "pretransform.model.decoder.layers.2.layers.2.layers.3.weight_v", "pretransform.model.decoder.layers.2.layers.3.layers.0.alpha", "pretransform.model.decoder.layers.2.layers.3.layers.0.beta", "pretransform.model.decoder.layers.2.layers.3.layers.1.bias", "pretransform.model.decoder.layers.2.layers.3.layers.1.weight_g", "pretransform.model.decoder.layers.2.layers.3.layers.1.weight_v", "pretransform.model.decoder.layers.2.layers.3.layers.2.alpha", "pretransform.model.decoder.layers.2.layers.3.layers.2.beta", "pretransform.model.decoder.layers.2.layers.3.layers.3.bias", "pretransform.model.decoder.layers.2.layers.3.layers.3.weight_g", "pretransform.model.decoder.layers.2.layers.3.layers.3.weight_v", "pretransform.model.decoder.layers.2.layers.4.layers.0.alpha", "pretransform.model.decoder.layers.2.layers.4.layers.0.beta", "pretransform.model.decoder.layers.2.layers.4.layers.1.bias", "pretransform.model.decoder.layers.2.layers.4.layers.1.weight_g", "pretransform.model.decoder.layers.2.layers.4.layers.1.weight_v", "pretransform.model.decoder.layers.2.layers.4.layers.2.alpha", "pretransform.model.decoder.layers.2.layers.4.layers.2.beta", "pretransform.model.decoder.layers.2.layers.4.layers.3.bias", "pretransform.model.decoder.layers.2.layers.4.layers.3.weight_g", "pretransform.model.decoder.layers.2.layers.4.layers.3.weight_v", "pretransform.model.decoder.layers.3.layers.0.alpha", "pretransform.model.decoder.layers.3.layers.0.beta", "pretransform.model.decoder.layers.3.layers.1.bias", "pretransform.model.decoder.layers.3.layers.1.weight_g", "pretransform.model.decoder.layers.3.layers.1.weight_v", "pretransform.model.decoder.layers.3.layers.2.layers.0.alpha", "pretransform.model.decoder.layers.3.layers.2.layers.0.beta", "pretransform.model.decoder.layers.3.layers.2.layers.1.bias", "pretransform.model.decoder.layers.3.layers.2.layers.1.weight_g", "pretransform.model.decoder.layers.3.layers.2.layers.1.weight_v", "pretransform.model.decoder.layers.3.layers.2.layers.2.alpha", "pretransform.model.decoder.layers.3.layers.2.layers.2.beta", "pretransform.model.decoder.layers.3.layers.2.layers.3.bias", "pretransform.model.decoder.layers.3.layers.2.layers.3.weight_g", "pretransform.model.decoder.layers.3.layers.2.layers.3.weight_v", "pretransform.model.decoder.layers.3.layers.3.layers.0.alpha", "pretransform.model.decoder.layers.3.layers.3.layers.0.beta", "pretransform.model.decoder.layers.3.layers.3.layers.1.bias", "pretransform.model.decoder.layers.3.layers.3.layers.1.weight_g", "pretransform.model.decoder.layers.3.layers.3.layers.1.weight_v", "pretransform.model.decoder.layers.3.layers.3.layers.2.alpha", "pretransform.model.decoder.layers.3.layers.3.layers.2.beta", "pretransform.model.decoder.layers.3.layers.3.layers.3.bias", "pretransform.model.decoder.layers.3.layers.3.layers.3.weight_g", "pretransform.model.decoder.layers.3.layers.3.layers.3.weight_v", "pretransform.model.decoder.layers.3.layers.4.layers.0.alpha", "pretransform.model.decoder.layers.3.layers.4.layers.0.beta", "pretransform.model.decoder.layers.3.layers.4.layers.1.bias", "pretransform.model.decoder.layers.3.layers.4.layers.1.weight_g", "pretransform.model.decoder.layers.3.layers.4.layers.1.weight_v", "pretransform.model.decoder.layers.3.layers.4.layers.2.alpha", "pretransform.model.decoder.layers.3.layers.4.layers.2.beta", "pretransform.model.decoder.layers.3.layers.4.layers.3.bias", "pretransform.model.decoder.layers.3.layers.4.layers.3.weight_g", "pretransform.model.decoder.layers.3.layers.4.layers.3.weight_v", "pretransform.model.decoder.layers.4.layers.0.alpha", "pretransform.model.decoder.layers.4.layers.0.beta", "pretransform.model.decoder.layers.4.layers.1.bias", "pretransform.model.decoder.layers.4.layers.1.weight_g", "pretransform.model.decoder.layers.4.layers.1.weight_v", "pretransform.model.decoder.layers.4.layers.2.layers.0.alpha", "pretransform.model.decoder.layers.4.layers.2.layers.0.beta", "pretransform.model.decoder.layers.4.layers.2.layers.1.bias", "pretransform.model.decoder.layers.4.layers.2.layers.1.weight_g", "pretransform.model.decoder.layers.4.layers.2.layers.1.weight_v", "pretransform.model.decoder.layers.4.layers.2.layers.2.alpha", "pretransform.model.decoder.layers.4.layers.2.layers.2.beta", "pretransform.model.decoder.layers.4.layers.2.layers.3.bias", "pretransform.model.decoder.layers.4.layers.2.layers.3.weight_g", "pretransform.model.decoder.layers.4.layers.2.layers.3.weight_v", "pretransform.model.decoder.layers.4.layers.3.layers.0.alpha", "pretransform.model.decoder.layers.4.layers.3.layers.0.beta", "pretransform.model.decoder.layers.4.layers.3.layers.1.bias", "pretransform.model.decoder.layers.4.layers.3.layers.1.weight_g", "pretransform.model.decoder.layers.4.layers.3.layers.1.weight_v", "pretransform.model.decoder.layers.4.layers.3.layers.2.alpha", "pretransform.model.decoder.layers.4.layers.3.layers.2.beta", "pretransform.model.decoder.layers.4.layers.3.layers.3.bias", "pretransform.model.decoder.layers.4.layers.3.layers.3.weight_g", "pretransform.model.decoder.layers.4.layers.3.layers.3.weight_v", "pretransform.model.decoder.layers.4.layers.4.layers.0.alpha", "pretransform.model.decoder.layers.4.layers.4.layers.0.beta", "pretransform.model.decoder.layers.4.layers.4.layers.1.bias", "pretransform.model.decoder.layers.4.layers.4.layers.1.weight_g", "pretransform.model.decoder.layers.4.layers.4.layers.1.weight_v", "pretransform.model.decoder.layers.4.layers.4.layers.2.alpha", "pretransform.model.decoder.layers.4.layers.4.layers.2.beta", "pretransform.model.decoder.layers.4.layers.4.layers.3.bias", "pretransform.model.decoder.layers.4.layers.4.layers.3.weight_g", "pretransform.model.decoder.layers.4.layers.4.layers.3.weight_v", "pretransform.model.decoder.layers.5.layers.0.alpha", "pretransform.model.decoder.layers.5.layers.0.beta", "pretransform.model.decoder.layers.5.layers.1.bias", "pretransform.model.decoder.layers.5.layers.1.weight_g", "pretransform.model.decoder.layers.5.layers.1.weight_v", "pretransform.model.decoder.layers.5.layers.2.layers.0.alpha", "pretransform.model.decoder.layers.5.layers.2.layers.0.beta", "pretransform.model.decoder.layers.5.layers.2.layers.1.bias", "pretransform.model.decoder.layers.5.layers.2.layers.1.weight_g", "pretransform.model.decoder.layers.5.layers.2.layers.1.weight_v", "pretransform.model.decoder.layers.5.layers.2.layers.2.alpha", "pretransform.model.decoder.layers.5.layers.2.layers.2.beta", "pretransform.model.decoder.layers.5.layers.2.layers.3.bias", "pretransform.model.decoder.layers.5.layers.2.layers.3.weight_g", "pretransform.model.decoder.layers.5.layers.2.layers.3.weight_v", "pretransform.model.decoder.layers.5.layers.3.layers.0.alpha", "pretransform.model.decoder.layers.5.layers.3.layers.0.beta", "pretransform.model.decoder.layers.5.layers.3.layers.1.bias", "pretransform.model.decoder.layers.5.layers.3.layers.1.weight_g", "pretransform.model.decoder.layers.5.layers.3.layers.1.weight_v", "pretransform.model.decoder.layers.5.layers.3.layers.2.alpha", "pretransform.model.decoder.layers.5.layers.3.layers.2.beta", "pretransform.model.decoder.layers.5.layers.3.layers.3.bias", "pretransform.model.decoder.layers.5.layers.3.layers.3.weight_g", "pretransform.model.decoder.layers.5.layers.3.layers.3.weight_v", "pretransform.model.decoder.layers.5.layers.4.layers.0.alpha", "pretransform.model.decoder.layers.5.layers.4.layers.0.beta", "pretransform.model.decoder.layers.5.layers.4.layers.1.bias", "pretransform.model.decoder.layers.5.layers.4.layers.1.weight_g", "pretransform.model.decoder.layers.5.layers.4.layers.1.weight_v", "pretransform.model.decoder.layers.5.layers.4.layers.2.alpha", "pretransform.model.decoder.layers.5.layers.4.layers.2.beta", "pretransform.model.decoder.layers.5.layers.4.layers.3.bias", "pretransform.model.decoder.layers.5.layers.4.layers.3.weight_g", "pretransform.model.decoder.layers.5.layers.4.layers.3.weight_v", "pretransform.model.decoder.layers.6.alpha", "pretransform.model.decoder.layers.6.beta", "pretransform.model.decoder.layers.7.weight_g", "pretransform.model.decoder.layers.7.weight_v", "pretransform.model.encoder.layers.0.bias", "pretransform.model.encoder.layers.0.weight_g", "pretransform.model.encoder.layers.0.weight_v", "pretransform.model.encoder.layers.1.layers.0.layers.0.alpha", "pretransform.model.encoder.layers.1.layers.0.layers.0.beta", "pretransform.model.encoder.layers.1.layers.0.layers.1.bias", "pretransform.model.encoder.layers.1.layers.0.layers.1.weight_g", "pretransform.model.encoder.layers.1.layers.0.layers.1.weight_v", "pretransform.model.encoder.layers.1.layers.0.layers.2.alpha", "pretransform.model.encoder.layers.1.layers.0.layers.2.beta", "pretransform.model.encoder.layers.1.layers.0.layers.3.bias", "pretransform.model.encoder.layers.1.layers.0.layers.3.weight_g", "pretransform.model.encoder.layers.1.layers.0.layers.3.weight_v", "pretransform.model.encoder.layers.1.layers.1.layers.0.alpha", "pretransform.model.encoder.layers.1.layers.1.layers.0.beta", "pretransform.model.encoder.layers.1.layers.1.layers.1.bias", "pretransform.model.encoder.layers.1.layers.1.layers.1.weight_g", "pretransform.model.encoder.layers.1.layers.1.layers.1.weight_v", "pretransform.model.encoder.layers.1.layers.1.layers.2.alpha", "pretransform.model.encoder.layers.1.layers.1.layers.2.beta", "pretransform.model.encoder.layers.1.layers.1.layers.3.bias", "pretransform.model.encoder.layers.1.layers.1.layers.3.weight_g", "pretransform.model.encoder.layers.1.layers.1.layers.3.weight_v", "pretransform.model.encoder.layers.1.layers.2.layers.0.alpha", "pretransform.model.encoder.layers.1.layers.2.layers.0.beta", "pretransform.model.encoder.layers.1.layers.2.layers.1.bias", "pretransform.model.encoder.layers.1.layers.2.layers.1.weight_g", "pretransform.model.encoder.layers.1.layers.2.layers.1.weight_v", "pretransform.model.encoder.layers.1.layers.2.layers.2.alpha", "pretransform.model.encoder.layers.1.layers.2.layers.2.beta", "pretransform.model.encoder.layers.1.layers.2.layers.3.bias", "pretransform.model.encoder.layers.1.layers.2.layers.3.weight_g", "pretransform.model.encoder.layers.1.layers.2.layers.3.weight_v", "pretransform.model.encoder.layers.1.layers.3.alpha", "pretransform.model.encoder.layers.1.layers.3.beta", "pretransform.model.encoder.layers.1.layers.4.bias", "pretransform.model.encoder.layers.1.layers.4.weight_g", "pretransform.model.encoder.layers.1.layers.4.weight_v", "pretransform.model.encoder.layers.2.layers.0.layers.0.alpha", "pretransform.model.encoder.layers.2.layers.0.layers.0.beta", "pretransform.model.encoder.layers.2.layers.0.layers.1.bias", "pretransform.model.encoder.layers.2.layers.0.layers.1.weight_g", "pretransform.model.encoder.layers.2.layers.0.layers.1.weight_v", "pretransform.model.encoder.layers.2.layers.0.layers.2.alpha", "pretransform.model.encoder.layers.2.layers.0.layers.2.beta", "pretransform.model.encoder.layers.2.layers.0.layers.3.bias", "pretransform.model.encoder.layers.2.layers.0.layers.3.weight_g", "pretransform.model.encoder.layers.2.layers.0.layers.3.weight_v", "pretransform.model.encoder.layers.2.layers.1.layers.0.alpha", "pretransform.model.encoder.layers.2.layers.1.layers.0.beta", "pretransform.model.encoder.layers.2.layers.1.layers.1.bias", "pretransform.model.encoder.layers.2.layers.1.layers.1.weight_g", "pretransform.model.encoder.layers.2.layers.1.layers.1.weight_v", "pretransform.model.encoder.layers.2.layers.1.layers.2.alpha", "pretransform.model.encoder.layers.2.layers.1.layers.2.beta", "pretransform.model.encoder.layers.2.layers.1.layers.3.bias", "pretransform.model.encoder.layers.2.layers.1.layers.3.weight_g", "pretransform.model.encoder.layers.2.layers.1.layers.3.weight_v", "pretransform.model.encoder.layers.2.layers.2.layers.0.alpha", "pretransform.model.encoder.layers.2.layers.2.layers.0.beta", "pretransform.model.encoder.layers.2.layers.2.layers.1.bias", "pretransform.model.encoder.layers.2.layers.2.layers.1.weight_g", "pretransform.model.encoder.layers.2.layers.2.layers.1.weight_v", "pretransform.model.encoder.layers.2.layers.2.layers.2.alpha", "pretransform.model.encoder.layers.2.layers.2.layers.2.beta", "pretransform.model.encoder.layers.2.layers.2.layers.3.bias", "pretransform.model.encoder.layers.2.layers.2.layers.3.weight_g", "pretransform.model.encoder.layers.2.layers.2.layers.3.weight_v", "pretransform.model.encoder.layers.2.layers.3.alpha", "pretransform.model.encoder.layers.2.layers.3.beta", "pretransform.model.encoder.layers.2.layers.4.bias", "pretransform.model.encoder.layers.2.layers.4.weight_g", "pretransform.model.encoder.layers.2.layers.4.weight_v", "pretransform.model.encoder.layers.3.layers.0.layers.0.alpha", "pretransform.model.encoder.layers.3.layers.0.layers.0.beta", "pretransform.model.encoder.layers.3.layers.0.layers.1.bias", "pretransform.model.encoder.layers.3.layers.0.layers.1.weight_g", "pretransform.model.encoder.layers.3.layers.0.layers.1.weight_v", "pretransform.model.encoder.layers.3.layers.0.layers.2.alpha", "pretransform.model.encoder.layers.3.layers.0.layers.2.beta", "pretransform.model.encoder.layers.3.layers.0.layers.3.bias", "pretransform.model.encoder.layers.3.layers.0.layers.3.weight_g", "pretransform.model.encoder.layers.3.layers.0.layers.3.weight_v", "pretransform.model.encoder.layers.3.layers.1.layers.0.alpha", "pretransform.model.encoder.layers.3.layers.1.layers.0.beta", "pretransform.model.encoder.layers.3.layers.1.layers.1.bias", "pretransform.model.encoder.layers.3.layers.1.layers.1.weight_g", "pretransform.model.encoder.layers.3.layers.1.layers.1.weight_v", "pretransform.model.encoder.layers.3.layers.1.layers.2.alpha", "pretransform.model.encoder.layers.3.layers.1.layers.2.beta", "pretransform.model.encoder.layers.3.layers.1.layers.3.bias", "pretransform.model.encoder.layers.3.layers.1.layers.3.weight_g", "pretransform.model.encoder.layers.3.layers.1.layers.3.weight_v", "pretransform.model.encoder.layers.3.layers.2.layers.0.alpha", "pretransform.model.encoder.layers.3.layers.2.layers.0.beta", "pretransform.model.encoder.layers.3.layers.2.layers.1.bias", "pretransform.model.encoder.layers.3.layers.2.layers.1.weight_g", "pretransform.model.encoder.layers.3.layers.2.layers.1.weight_v", "pretransform.model.encoder.layers.3.layers.2.layers.2.alpha", "pretransform.model.encoder.layers.3.layers.2.layers.2.beta", "pretransform.model.encoder.layers.3.layers.2.layers.3.bias", "pretransform.model.encoder.layers.3.layers.2.layers.3.weight_g", "pretransform.model.encoder.layers.3.layers.2.layers.3.weight_v", "pretransform.model.encoder.layers.3.layers.3.alpha", "pretransform.model.encoder.layers.3.layers.3.beta", "pretransform.model.encoder.layers.3.layers.4.bias", "pretransform.model.encoder.layers.3.layers.4.weight_g", "pretransform.model.encoder.layers.3.layers.4.weight_v", "pretransform.model.encoder.layers.4.layers.0.layers.0.alpha", "pretransform.model.encoder.layers.4.layers.0.layers.0.beta", "pretransform.model.encoder.layers.4.layers.0.layers.1.bias", "pretransform.model.encoder.layers.4.layers.0.layers.1.weight_g", "pretransform.model.encoder.layers.4.layers.0.layers.1.weight_v", "pretransform.model.encoder.layers.4.layers.0.layers.2.alpha", "pretransform.model.encoder.layers.4.layers.0.layers.2.beta", "pretransform.model.encoder.layers.4.layers.0.layers.3.bias", "pretransform.model.encoder.layers.4.layers.0.layers.3.weight_g", "pretransform.model.encoder.layers.4.layers.0.layers.3.weight_v", "pretransform.model.encoder.layers.4.layers.1.layers.0.alpha", "pretransform.model.encoder.layers.4.layers.1.layers.0.beta", "pretransform.model.encoder.layers.4.layers.1.layers.1.bias", "pretransform.model.encoder.layers.4.layers.1.layers.1.weight_g", "pretransform.model.encoder.layers.4.layers.1.layers.1.weight_v", "pretransform.model.encoder.layers.4.layers.1.layers.2.alpha", "pretransform.model.encoder.layers.4.layers.1.layers.2.beta", "pretransform.model.encoder.layers.4.layers.1.layers.3.bias", "pretransform.model.encoder.layers.4.layers.1.layers.3.weight_g", "pretransform.model.encoder.layers.4.layers.1.layers.3.weight_v", "pretransform.model.encoder.layers.4.layers.2.layers.0.alpha", "pretransform.model.encoder.layers.4.layers.2.layers.0.beta", "pretransform.model.encoder.layers.4.layers.2.layers.1.bias", "pretransform.model.encoder.layers.4.layers.2.layers.1.weight_g", "pretransform.model.encoder.layers.4.layers.2.layers.1.weight_v", "pretransform.model.encoder.layers.4.layers.2.layers.2.alpha", "pretransform.model.encoder.layers.4.layers.2.layers.2.beta", "pretransform.model.encoder.layers.4.layers.2.layers.3.bias", "pretransform.model.encoder.layers.4.layers.2.layers.3.weight_g", "pretransform.model.encoder.layers.4.layers.2.layers.3.weight_v", "pretransform.model.encoder.layers.4.layers.3.alpha", "pretransform.model.encoder.layers.4.layers.3.beta", "pretransform.model.encoder.layers.4.layers.4.bias", "pretransform.model.encoder.layers.4.layers.4.weight_g", "pretransform.model.encoder.layers.4.layers.4.weight_v", "pretransform.model.encoder.layers.5.layers.0.layers.0.alpha", "pretransform.model.encoder.layers.5.layers.0.layers.0.beta", "pretransform.model.encoder.layers.5.layers.0.layers.1.bias", "pretransform.model.encoder.layers.5.layers.0.layers.1.weight_g", "pretransform.model.encoder.layers.5.layers.0.layers.1.weight_v", "pretransform.model.encoder.layers.5.layers.0.layers.2.alpha", "pretransform.model.encoder.layers.5.layers.0.layers.2.beta", "pretransform.model.encoder.layers.5.layers.0.layers.3.bias", "pretransform.model.encoder.layers.5.layers.0.layers.3.weight_g", "pretransform.model.encoder.layers.5.layers.0.layers.3.weight_v", "pretransform.model.encoder.layers.5.layers.1.layers.0.alpha", "pretransform.model.encoder.layers.5.layers.1.layers.0.beta", "pretransform.model.encoder.layers.5.layers.1.layers.1.bias", "pretransform.model.encoder.layers.5.layers.1.layers.1.weight_g", "pretransform.model.encoder.layers.5.layers.1.layers.1.weight_v", "pretransform.model.encoder.layers.5.layers.1.layers.2.alpha", "pretransform.model.encoder.layers.5.layers.1.layers.2.beta", "pretransform.model.encoder.layers.5.layers.1.layers.3.bias", "pretransform.model.encoder.layers.5.layers.1.layers.3.weight_g", "pretransform.model.encoder.layers.5.layers.1.layers.3.weight_v", "pretransform.model.encoder.layers.5.layers.2.layers.0.alpha", "pretransform.model.encoder.layers.5.layers.2.layers.0.beta", "pretransform.model.encoder.layers.5.layers.2.layers.1.bias", "pretransform.model.encoder.layers.5.layers.2.layers.1.weight_g", "pretransform.model.encoder.layers.5.layers.2.layers.1.weight_v", "pretransform.model.encoder.layers.5.layers.2.layers.2.alpha", "pretransform.model.encoder.layers.5.layers.2.layers.2.beta", "pretransform.model.encoder.layers.5.layers.2.layers.3.bias", "pretransform.model.encoder.layers.5.layers.2.layers.3.weight_g", "pretransform.model.encoder.layers.5.layers.2.layers.3.weight_v", "pretransform.model.encoder.layers.5.layers.3.alpha", "pretransform.model.encoder.layers.5.layers.3.beta", "pretransform.model.encoder.layers.5.layers.4.bias", "pretransform.model.encoder.layers.5.layers.4.weight_g", "pretransform.model.encoder.layers.5.layers.4.weight_v", "pretransform.model.encoder.layers.6.alpha", "pretransform.model.encoder.layers.6.beta", "pretransform.model.encoder.layers.7.bias", "pretransform.model.encoder.layers.7.weight_g", "pretransform.model.encoder.layers.7.weight_v". ```

Edit1: I had Issue with defaults.ini and confusion of --ckpt-path, --pretrained-ckpt-path and --pretransform-ckpt-path. #87


Having same file .txt and .wav, would be much simpler than config dataset. .wav +.txt can be handle like

custom_metadata.py
```python import os def get_custom_metadata(info, audio): filename = os.path.basename(info["path"]) # Get the filename from the info dictionary base_name, _ = os.path.splitext(filename) # Extract the base name without extension txt_file_path = os.path.join(os.path.dirname(info["path"]), f"{base_name}.txt") # Construct the path to the corresponding text file if os.path.exists(txt_file_path): with open(txt_file_path, "r", encoding="utf-8") as f: prompt = f.read().strip() else: prompt = "No prompt available" return {"prompt": prompt} ```

kindly asking for real relative path instead of the instruction: https://github.com/Stability-AI/stable-audio-tools/blob/b51af8b60a0e619780e6be5cd35bd2525073ec52/README.md?plain=1#L63 In order to replicate Fine-tuning, with as simply 2 example sound files included and config.


How to caption from sound to text? lyramakesmusic/clap-interrogator and key, bpm using librosa.

Katehuuh commented 3 months ago

I have successfully fine-tuned stabilityai/stable-audio-open-1.0 using 32GB VRAM, 12 *.wav each 44100Hz of 30s:

python train.py --dataset-config dataset_config.json --model-config stable-audio-open-1.0\model_config.json --name wonder_pop_dataset --pretrained-ckpt-path stable-audio-open-1.0\model.safetensors --batch-size 1 --checkpoint-every 1 --save-dir saved_ckpt
dataset_config.json
```json { "dataset_type": "audio_dir", "datasets": [ { "id": "wonder_pop_dataset", "path": ".\\data", "custom_metadata_module": ".\\data\\custom_metadata.py" } ], "random_crop": false } ``` --- ```cmd │ train.py │ └─data │ custom_metadata.py │ wonder-pop1.txt └─ wonder-pop1.wav ``` #### Interface: ``` python unwrap_model.py --model-config stable-audio-open-1.0/model_config.json --ckpt-path ./saved_ckpt/wonder_pop_dataset/zo7frfa3/checkpoints/epoch=0-step=1.ckpt --name wonder_pop_dataset python run_gradio.py --ckpt-path wonder_pop_dataset.ckpt --model-config stable-audio-open-1.0/model_config.json --model-half ```
JLenzy commented 2 months ago

@Katehuuh would you mind explaining how/where you actually incorporate the CLAP embeddings? Do you run CLAP inference each time with your custom_metadata.py or perhaps elsewhere?

Katehuuh commented 2 months ago

@Katehuuh would you mind explaining how/where you actually incorporate the CLAP embeddings? Do you run CLAP inference each time with your custom_metadata.py or perhaps elsewhere?

I’ve just fine-tuned stable-audio-open-1.0\model_config.json, rather than pretraining from scratch with required CLAP set https://github.com/Stability-AI/stable-audio-tools/blob/b51af8b60a0e619780e6be5cd35bd2525073ec52/stable_audio_tools/configs/model_configs/txt2audio/stable_audio_1_0.json#L45