LAION-AI / CLAP

Contrastive Language-Audio Pretraining
https://arxiv.org/abs/2211.06687
Creative Commons Zero v1.0 Universal
1.33k stars 128 forks source link

Weights Not Loading for text_branch in Training Script #158

Open WangHaoyuuu opened 1 month ago

WangHaoyuuu commented 1 month ago

I am experiencing an issue with a training script for an audio-visual model where the text_branch components are not loading any pre-trained weights as expected. The unloaded components include all layers of text_branch, such as word embeddings, position embeddings, attention layers, and all associated weights and biases. Is this necessary?

here is the warning: 7eaae4339b849a33e8d9256d252b26a

text_branch.embeddings.word_embeddings.weight Unloaded text_branch.embeddings.position_embeddings.weight Unloaded text_branch.embeddings.token_type_embeddings.weight Unloaded text_branch.embeddings.LayerNorm.weight Unloaded text_branch.embeddings.LayerNorm.bias Unloaded text_branch.encoder.layer.0.attention.self.query.weight Unloaded text_branch.encoder.layer.0.attention.self.query.bias Unloaded text_branch.encoder.layer.0.attention.self.key.weight Unloaded text_branch.encoder.layer.0.attention.self.key.bias Unloaded text_branch.encoder.layer.0.attention.self.value.weight Unloaded text_branch.encoder.layer.0.attention.self.value.bias Unloaded text_branch.encoder.layer.0.attention.output.dense.weight Unloaded text_branch.encoder.layer.0.attention.output.dense.bias Unloaded text_branch.encoder.layer.0.attention.output.LayerNorm.weight Unloaded text_branch.encoder.layer.0.attention.output.LayerNorm.bias Unloaded text_branch.encoder.layer.0.intermediate.dense.weight Unloaded text_branch.encoder.layer.0.intermediate.dense.bias Unloaded text_branch.encoder.layer.0.output.dense.weight Unloaded text_branch.encoder.layer.0.output.dense.bias Unloaded

Here is my shell script:

!/bin/bash

CUDA_VISIBLE_DEVICES=1 python -m training.main \ --save-frequency 5 \ --save-top-performance 3 \ --save-most-recent \ --dataset-type="webdataset" \ --datasetpath='/home/ubuntu/AudioLDM-training-finetuning/data/dataset' \ --precision="fp32" \ --batch-size=32 \ --lr=1e-4 \ --wd=0.0 \ --epochs=45 \ --workers=1 \ --use-bn-sync \ --amodel HTSAT-tiny \ --tmodel roberta \ --warmup 3200 \ --datasetnames "Clotho" \ --datasetinfos "train" \ --top-k-checkpoint-select-dataset="Clotho-test" \ --top-k-checkpoint-select-metric="mAP@10" \ --logs 'logs' \ --seed 3407 \ --gather-with-grad \ --optimizer "adam" \ --data-filling "repeatpad" \ --data-truncating "rand_trunc" \ --pretrained-audio '/home/ubuntu/AudioLDM-training-finetuning/data/checkpoints/HTSAT-fullset-imagenet-tiny-map=0.467.ckpt'

Here are my evaluation results: b6aa058625936a6a48c43a9ff5c145a

It is very bad.

Could you please help me?

zjsong commented 1 month ago

I am experiencing an issue with a training script for an audio-visual model where the text_branch components are not loading any pre-trained weights as expected. The unloaded components include all layers of text_branch, such as word embeddings, position embeddings, attention layers, and all associated weights and biases. Is this necessary?

here is the warning: 7eaae4339b849a33e8d9256d252b26a

text_branch.embeddings.word_embeddings.weight Unloaded text_branch.embeddings.position_embeddings.weight Unloaded text_branch.embeddings.token_type_embeddings.weight Unloaded text_branch.embeddings.LayerNorm.weight Unloaded text_branch.embeddings.LayerNorm.bias Unloaded text_branch.encoder.layer.0.attention.self.query.weight Unloaded text_branch.encoder.layer.0.attention.self.query.bias Unloaded text_branch.encoder.layer.0.attention.self.key.weight Unloaded text_branch.encoder.layer.0.attention.self.key.bias Unloaded text_branch.encoder.layer.0.attention.self.value.weight Unloaded text_branch.encoder.layer.0.attention.self.value.bias Unloaded text_branch.encoder.layer.0.attention.output.dense.weight Unloaded text_branch.encoder.layer.0.attention.output.dense.bias Unloaded text_branch.encoder.layer.0.attention.output.LayerNorm.weight Unloaded text_branch.encoder.layer.0.attention.output.LayerNorm.bias Unloaded text_branch.encoder.layer.0.intermediate.dense.weight Unloaded text_branch.encoder.layer.0.intermediate.dense.bias Unloaded text_branch.encoder.layer.0.output.dense.weight Unloaded text_branch.encoder.layer.0.output.dense.bias Unloaded

Here is my shell script: #!/bin/bash

CUDA_VISIBLE_DEVICES=1 python -m training.main --save-frequency 5 --save-top-performance 3 --save-most-recent --dataset-type="webdataset" --datasetpath='/home/ubuntu/AudioLDM-training-finetuning/data/dataset' --precision="fp32" --batch-size=32 --lr=1e-4 --wd=0.0 --epochs=45 --workers=1 --use-bn-sync --amodel HTSAT-tiny --tmodel roberta --warmup 3200 --datasetnames "Clotho" --datasetinfos "train" --top-k-checkpoint-select-dataset="Clotho-test" --top-k-checkpoint-select-metric="mAP@10" --logs 'logs' --seed 3407 --gather-with-grad --optimizer "adam" --data-filling "repeatpad" --data-truncating "rand_trunc" --pretrained-audio '/home/ubuntu/AudioLDM-training-finetuning/data/checkpoints/HTSAT-fullset-imagenet-tiny-map=0.467.ckpt'

Here are my evaluation results: b6aa058625936a6a48c43a9ff5c145a

It is very bad.

Could you please help me?

It seems like that the aforementioned weights would be loaded successfully when using the updated version (1.1.6). 1

However, the warning is still there: 2

Not sure if this matters.