PKU-YuanGroup / LanguageBind

【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
https://arxiv.org/abs/2310.01852
MIT License
549 stars 44 forks source link

What is the training configurations for full tuning? #32

Closed StanLei52 closed 4 months ago

StanLei52 commented 4 months ago

Hi, I notice that in your paper, the results for full-tuning are reported. I'd like to know the training configurations for full tuning -- do you use the text prompt and input modality data with contrastive learning during full tuning, or use class labels with traditional classification setting (e.g., cross-entropy loss)? Thank you.

LinB203 commented 4 months ago

Sorry for late reply. You can refer to this. We didn't add any tricks, just removed LoRA.

aopolin-lv commented 4 months ago

Sorry for late reply. You can refer to this. We didn't add any tricks, just removed LoRA.

Hyperlink is not found and please update. Thank you!

LinB203 commented 4 months ago

Sorry for late reply. You can refer to this. We didn't add any tricks, just removed LoRA.

Hyperlink is not found and please update. Thank you!

Sorry, we update it. https://github.com/PKU-YuanGroup/LanguageBind/blob/main/scripts/video_language/train_1.5_large.sh

aopolin-lv commented 4 months ago

Sorry for late reply. You can refer to this. We didn't add any tricks, just removed LoRA.

Hyperlink is not found and please update. Thank you!

Sorry, we update it. https://github.com/PKU-YuanGroup/LanguageBind/blob/main/scripts/video_language/train_1.5_large.sh

Thanks for your response!