[Feature Request] Pull model through Ollama API instead of invoking ollama binary

aidatatools / ollama-benchmark

LLM Benchmark for Throughput via Ollama (Local LLMs)

https://llm.aidatatools.com/

MIT License

100 stars 14 forks source link

[Feature Request] Pull model through Ollama API instead of invoking ollama binary #13

Open yeahdongcn opened 3 months ago

yeahdongcn commented 3 months ago

User story: I want to benchmark ollama running inside a docker container. I would prefer to install a venv or conda env with llm_benchmark in a different host or container.

Ollama API doc: https://github.com/ollama/ollama/blob/main/docs/api.md#pull-a-model

yeahdongcn commented 3 months ago

I sent a PR to query device information through Ollama API: https://github.com/ollama/ollama/pull/5479 It could be used to replace GPUtil to check the available VRAM.