FoundationVision / Groma

[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
https://groma-mllm.github.io/
Apache License 2.0
483 stars 55 forks source link

System requirements for running the model ? #1

Closed learnermaxRL closed 2 months ago

learnermaxRL commented 2 months ago

Can you please detail the system requirements , can this run on mac m2 air ?

machuofan commented 2 months ago

Hi, Groma-7b takes 30-40G memory for inference on a single GPU. But we have not tested it on CPU. I guess you may need to quantize the model as in LLaVA to make it run on Mac.