FoundationVision / Groma

[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
https://groma-mllm.github.io/
Apache License 2.0
483 stars 55 forks source link

About pretrain checkpoint #16

Closed xuliu-cyber closed 1 month ago

xuliu-cyber commented 1 month ago

Hello, Does the checkpoints under the groma-7b-finetune contains the DINOv2 checkpoint and region proposal checkpoint? Or they will be downloaded automatically?

machuofan commented 1 month ago

Yes, groma-7b-finetune contains weights of all parts.