FoundationVision / Groma

[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
https://groma-mllm.github.io/
Apache License 2.0
483 stars 55 forks source link

4 bit model #9

Closed vcadillog closed 1 month ago

vcadillog commented 1 month ago

Thanks for your research, I have been trying your model and found it very impressive and useful. I made a public model in huggingface quantized to 4 bits with Bitsandbytes as I found some people asking for it.

Some details about the minimum GPU required is 7GB RAM for single inference.

If someone finds it useful please feel free to use it.

machuofan commented 1 month ago

Thanks for your contribution!