Moondream inference slow even after cuda acceleration installed for llama-cpp-python

gokayfem / ComfyUI_VLM_nodes

Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation

Apache License 2.0

308 stars 24 forks source link

Moondream inference slow even after cuda acceleration installed for llama-cpp-python #17

Closed procomp91 closed 5 months ago

procomp91 commented 5 months ago

Partly issue, partly solution. The text_model.py for moondream doesn't seem to take advantage of the gpu. Changing device_map to auto accelerated the inference speed on my 3090 from 20s to .5s

self.model = load_checkpoint_and_dispatch( self.model, f"{model_path}/text_model.pt", device_map="auto", )

procomp91 commented 5 months ago

also, thanks for the nodes :)

gokayfem commented 5 months ago

thanks for the correction, i didnt try moondream node that much, thanks again for excellent feedback 👍