LAION-AI / CLAP

Contrastive Language-Audio Pretraining
https://arxiv.org/abs/2211.06687
Creative Commons Zero v1.0 Universal
1.41k stars 135 forks source link

text encoder does not support pretrained models #159

Open FeminaS17 opened 3 months ago

FeminaS17 commented 3 months ago

https://huggingface.co/lukewys/laion_clap/resolve/main/music_audioset_epoch_15_esc_90.14.pt When I was trying to fine-tune the model with the training scipt, I'm getting this error "AssertionError: bert/roberta/bart text encoder does not support pretrained models."

for the same model and transformer versions 4.30.0 and 4.30.2. Please suggest a workaround. @waldleitner @lukewys @Neptune-S-777

waldleitner commented 2 weeks ago

@FeminaS17 I think you fail due to this check in main.py. I would assume, due to training your own projection layers on top of the text encoder, you cannot specify both pretrained model and a separate text encoder.

    if args.tmodel == "bert" or args.tmodel == "roberta" or args.tmodel == "bart":
        assert (
            args.pretrained == "" or args.pretrained is None
        ), "bert/roberta/bart text encoder does not support pretrained models."

https://github.com/LAION-AI/CLAP/blob/8e558817d853808486768004fa1b61ac9d69f2a2/src/laion_clap/training/main.py#L140C1-L144C1