Open WangHaoyuuu opened 4 months ago
I am experiencing an issue with a training script for an audio-visual model where the text_branch components are not loading any pre-trained weights as expected. The unloaded components include all layers of text_branch, such as word embeddings, position embeddings, attention layers, and all associated weights and biases. Is this necessary?
here is the warning:
text_branch.embeddings.word_embeddings.weight Unloaded text_branch.embeddings.position_embeddings.weight Unloaded text_branch.embeddings.token_type_embeddings.weight Unloaded text_branch.embeddings.LayerNorm.weight Unloaded text_branch.embeddings.LayerNorm.bias Unloaded text_branch.encoder.layer.0.attention.self.query.weight Unloaded text_branch.encoder.layer.0.attention.self.query.bias Unloaded text_branch.encoder.layer.0.attention.self.key.weight Unloaded text_branch.encoder.layer.0.attention.self.key.bias Unloaded text_branch.encoder.layer.0.attention.self.value.weight Unloaded text_branch.encoder.layer.0.attention.self.value.bias Unloaded text_branch.encoder.layer.0.attention.output.dense.weight Unloaded text_branch.encoder.layer.0.attention.output.dense.bias Unloaded text_branch.encoder.layer.0.attention.output.LayerNorm.weight Unloaded text_branch.encoder.layer.0.attention.output.LayerNorm.bias Unloaded text_branch.encoder.layer.0.intermediate.dense.weight Unloaded text_branch.encoder.layer.0.intermediate.dense.bias Unloaded text_branch.encoder.layer.0.output.dense.weight Unloaded text_branch.encoder.layer.0.output.dense.bias Unloaded
Here is my shell script: #!/bin/bash
CUDA_VISIBLE_DEVICES=1 python -m training.main --save-frequency 5 --save-top-performance 3 --save-most-recent --dataset-type="webdataset" --datasetpath='/home/ubuntu/AudioLDM-training-finetuning/data/dataset' --precision="fp32" --batch-size=32 --lr=1e-4 --wd=0.0 --epochs=45 --workers=1 --use-bn-sync --amodel HTSAT-tiny --tmodel roberta --warmup 3200 --datasetnames "Clotho" --datasetinfos "train" --top-k-checkpoint-select-dataset="Clotho-test" --top-k-checkpoint-select-metric="mAP@10" --logs 'logs' --seed 3407 --gather-with-grad --optimizer "adam" --data-filling "repeatpad" --data-truncating "rand_trunc" --pretrained-audio '/home/ubuntu/AudioLDM-training-finetuning/data/checkpoints/HTSAT-fullset-imagenet-tiny-map=0.467.ckpt'
Here are my evaluation results:
It is very bad.
Could you please help me?
It seems like that the aforementioned weights would be loaded successfully when using the updated version (1.1.6).
However, the warning is still there:
Not sure if this matters.
I am experiencing an issue with a training script for an audio-visual model where the text_branch components are not loading any pre-trained weights as expected. The unloaded components include all layers of text_branch, such as word embeddings, position embeddings, attention layers, and all associated weights and biases. Is this necessary?
here is the warning:
text_branch.embeddings.word_embeddings.weight Unloaded text_branch.embeddings.position_embeddings.weight Unloaded text_branch.embeddings.token_type_embeddings.weight Unloaded text_branch.embeddings.LayerNorm.weight Unloaded text_branch.embeddings.LayerNorm.bias Unloaded text_branch.encoder.layer.0.attention.self.query.weight Unloaded text_branch.encoder.layer.0.attention.self.query.bias Unloaded text_branch.encoder.layer.0.attention.self.key.weight Unloaded text_branch.encoder.layer.0.attention.self.key.bias Unloaded text_branch.encoder.layer.0.attention.self.value.weight Unloaded text_branch.encoder.layer.0.attention.self.value.bias Unloaded text_branch.encoder.layer.0.attention.output.dense.weight Unloaded text_branch.encoder.layer.0.attention.output.dense.bias Unloaded text_branch.encoder.layer.0.attention.output.LayerNorm.weight Unloaded text_branch.encoder.layer.0.attention.output.LayerNorm.bias Unloaded text_branch.encoder.layer.0.intermediate.dense.weight Unloaded text_branch.encoder.layer.0.intermediate.dense.bias Unloaded text_branch.encoder.layer.0.output.dense.weight Unloaded text_branch.encoder.layer.0.output.dense.bias Unloaded
Here is my shell script:
!/bin/bash
CUDA_VISIBLE_DEVICES=1 python -m training.main \ --save-frequency 5 \ --save-top-performance 3 \ --save-most-recent \ --dataset-type="webdataset" \ --datasetpath='/home/ubuntu/AudioLDM-training-finetuning/data/dataset' \ --precision="fp32" \ --batch-size=32 \ --lr=1e-4 \ --wd=0.0 \ --epochs=45 \ --workers=1 \ --use-bn-sync \ --amodel HTSAT-tiny \ --tmodel roberta \ --warmup 3200 \ --datasetnames "Clotho" \ --datasetinfos "train" \ --top-k-checkpoint-select-dataset="Clotho-test" \ --top-k-checkpoint-select-metric="mAP@10" \ --logs 'logs' \ --seed 3407 \ --gather-with-grad \ --optimizer "adam" \ --data-filling "repeatpad" \ --data-truncating "rand_trunc" \ --pretrained-audio '/home/ubuntu/AudioLDM-training-finetuning/data/checkpoints/HTSAT-fullset-imagenet-tiny-map=0.467.ckpt'
Here are my evaluation results:
It is very bad.
Could you please help me?