Adding CPU/GPU distribution to the logs and reports

dan-and commented 5 months ago

While checking your result statistics on https://llm.aidatatools.com/ I always missed a notification if the model was completely loaded into a GPU or if it runs in a mixed environment.

Implementing such a check could be a low-hanging fruit, as ollama keeps the last model running after closing the request at run_benchmark.py: 75: result = subprocess.run([ollamabin, 'run', model_name, one_prompt['prompt'],'--verbose'], capture_output=True, text=True, check=True, encoding='utf-8')

If you add another call subprocess.run([ollamabin, 'ps'], capture_output=True, text=True, check=True, encoding='utf-8') you can still gather the utilization.

e.g.: ` $ ollama ps NAME ID SIZE PROCESSOR UNTIL
qwen2:1.5b f6daf2b25194 1.8 GB 100% GPU 4 minutes from now

$ ollama ps NAME ID SIZE PROCESSOR UNTIL
llama3:70b 786f3184aec0 41 GB 79%/21% CPU/GPU 4 minutes from now
`

Based on the ollama documentation, it will be possible to have several models loaded at the same time. So you need to expect that in future, ollama ps will report several rows of models. Adding a filter to the model_name of the ollama ps output should be future-proof .

At the end: it would be great to see that usage/distribution on your results pages.

nuffin commented 5 months ago

I suggest to add "WSL" (Windows Subsystem for Linux) category or just a tag in Linux category also. to detect whether a Linux distribution is in WSL, just check whether the output of uname -r command or platform.uname().release ends with -WSL2:

5.15.146.1-microsoft-standard-WSL2

But I'm not sure the ending for WSL1, maybe -Microsoft (found from this file). Linux running on bare metal or in WSL use different of cuda library, and are not same environemnt.

chuangtc commented 5 months ago

@nuffin You have great points, would you please create a separate ticket? I don't want to mix two different issues in one ticket.

nuffin commented 5 months ago

@nuffin You have great points, would you please create a separate ticket? I don't want to mix two different issues in one ticket.

sure. I'm creating it.

aidatatools / ollama-benchmark

Adding CPU/GPU distribution to the logs and reports #11