THUDM / CogVideo

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
Apache License 2.0
8.78k stars 836 forks source link

Can cogvlm2-llama3-caption generate Chinese caption? #319

Closed MinliangLin closed 3 weeks ago

MinliangLin commented 1 month ago

Feature request / 功能建议

Can cogvlm2-llama3-caption generate Chinese caption? If no, is it possible to fine tune a Chinese captioning model with low cost, i.e. 20~60 GPU hours?

Motivation / 动机

I want to generate Chinese caption for a huge dataset using cogvlm2-llama3-caption.

Your contribution / 您的贡献

N/A

huangshiyu13 commented 1 month ago

CogVLM2-Caption can not generate Chinese captions. You can finetune your own Chinese caption model using SWIFT: https://github.com/modelscope/ms-swift/blob/main/docs/source_en/Multi-Modal/cogvlm2-video-best-practice.md