Open dan-and opened 5 months ago
I suggest to add "WSL" (Windows Subsystem for Linux) category or just a tag in Linux category also. to detect whether a Linux distribution is in WSL, just check whether the output of uname -r
command or platform.uname().release
ends with -WSL2
:
5.15.146.1-microsoft-standard-WSL2
But I'm not sure the ending for WSL1, maybe -Microsoft
(found from this file). Linux running on bare metal or in WSL use different of cuda
library, and are not same environemnt.
@nuffin You have great points, would you please create a separate ticket? I don't want to mix two different issues in one ticket.
@nuffin You have great points, would you please create a separate ticket? I don't want to mix two different issues in one ticket.
sure. I'm creating it.
While checking your result statistics on https://llm.aidatatools.com/ I always missed a notification if the model was completely loaded into a GPU or if it runs in a mixed environment.
Implementing such a check could be a low-hanging fruit, as ollama keeps the last model running after closing the request at run_benchmark.py: 75: result = subprocess.run([ollamabin, 'run', model_name, one_prompt['prompt'],'--verbose'], capture_output=True, text=True, check=True, encoding='utf-8')
If you add another call subprocess.run([ollamabin, 'ps'], capture_output=True, text=True, check=True, encoding='utf-8') you can still gather the utilization.
e.g.: ` $ ollama ps NAME ID SIZE PROCESSOR UNTIL
qwen2:1.5b f6daf2b25194 1.8 GB 100% GPU 4 minutes from now
$ ollama ps NAME ID SIZE PROCESSOR UNTIL
llama3:70b 786f3184aec0 41 GB 79%/21% CPU/GPU 4 minutes from now
`
Based on the ollama documentation, it will be possible to have several models loaded at the same time. So you need to expect that in future, ollama ps will report several rows of models. Adding a filter to the model_name of the ollama ps output should be future-proof .
At the end: it would be great to see that usage/distribution on your results pages.