OSU-NLP-Group / UGround

Official Repo for UGround
https://osu-nlp-group.github.io/UGround/
74 stars 5 forks source link

Inference hardware requirements #2

Open abrichr opened 1 week ago

abrichr commented 1 week ago

Hello, and thank you for the excellent work!

In the paper it says:

The first stage takes about 50 hours on a single 4x NVIDIA A100 machine (global batch size 128 with gradient accumulation). And for the large scale GUI data training, we use 112 NVIDIA H100 GPUs and finish the training in about 6 hours (global batch size 448).

Can you please clarify what are the inference time hardware requirements? Any chance of running this on CPU?

Thanks again!

boyugou commented 1 week ago

Overall, it's built on LLava with slight adaptations (mainly about input image processing), so it's definitely possible to run on CPU (take Ollama as a reference). I remember 4bit llava can run very smoothly on my laptop.