Not sure if ollama:r36.2.0 is using GPU

UserName-wang commented 6 months ago

Dear @dusty-nv , I pulled dustynv/ollama:r36.2.0 on jeston orin 32G DEV. run command: jetson-containers run --name ollama $(autotag ollama), the output are: [GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.

using env: export GIN_MODE=release
using code: gin.SetMode(gin.ReleaseMode)

[GIN-debug] POST /api/pull [GIN-debug] POST /api/generate [GIN-debug] POST /api/chat [GIN-debug] POST /api/embeddings [GIN-debug] POST /api/create [GIN-debug] POST /api/push [GIN-debug] POST /api/copy [GIN-debug] DELETE /api/delete [GIN-debug] POST /api/show [GIN-debug] POST /api/blobs/:digest [GIN-debug] HEAD /api/blobs/:digest [GIN-debug] POST /v1/chat/completions [GIN-debug] GET / [GIN-debug] GET /api/tags [GIN-debug] GET /api/version [GIN-debug] HEAD / [GIN-debug] HEAD /api/tags [GIN-debug] HEAD /api/version time=2024-04-27T00:32:16.148Z time=2024-04-27T00:32:16.149Z time=2024-04-27T00:32:26.579Z time=2024-04-27T00:32:26.579Z time=2024-04-27T00:32:26.657Z time=2024-04-27T00:32:26.658Z --> github.com/ollama/ollama/server.(Server).PullModelHandler-fm (5 handlers) --> github.com/ollama/ollama/server.(Server).GenerateHandler-fm (5 handlers) --> github.com/ollama/ollama/server.(Server).ChatHandler-fm (5 handlers) --> github.com/ollama/ollama/server.(Server).EmbeddingsHandler-fm (5 handlers) --> github.com/ollama/ollama/server.(Server).CreateModelHandler-fm (5 handlers) --> github.com/ollama/ollama/server.(Server).PushModelHandler-fm (5 handlers) --> github.com/ollama/ollama/server.(Server).CopyModelHandler-fm (5 handlers) --> github.com/ollama/ollama/server.(Server).DeleteModelHandler-fm (5 handlers) --> github.com/ollama/ollama/server.(Server).ShowModelHandler-fm (5 handlers) --> github.com/ollama/ollama/server.(Server).CreateBlobHandler-fm (5 handlers) --> github.com/ollama/ollama/server.(Server).HeadBlobHandler-fm (5 handlers) --> github.com/ollama/ollama/server.(Server).ChatHandler-fm (6 handlers) --> github.com/ollama/ollama/server.(Server).GenerateRoutes.func1 (5 handlers) --> github.com/ollama/ollama/server.(Server).ListModelsHandler-fm (5 handlers) --> github.com/ollama/ollama/server.(Server).GenerateRoutes.func2 (5 handlers) --> github.com/ollama/ollama/server.(Server).GenerateRoutes.func1 (5 handlers) --> github.com/ollama/ollama/server.(Server).ListModelsHandler-fm (5 handlers) --> github.com/ollama/ollama/server.(Server).GenerateRoutes.func2 (5 handlers) level=INFO source=routes.go:1064 msg="Listening on [::]:11434 (version 0.0.0)" level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama359642117/runners level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cuda_v12]" level=INFO source=gpu.go:96 msg="Detecting GPUs" level=INFO source=gpu.go:101 msg="detected GPUs" library=/tmp/ollama359642117/runners/cuda_v12/libcudart.so.12 count=1 level=INFO source=cpu_common.go:18 msg="CPU does not have vector extensions"

+---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | No running processes found | +---------------------------------------------------------------------------------------+ Seems GPU is not used by ollama?

the token output of llama3:latest is quite fast, but llava:34b is quite slow and the CPU usage of llava:34b is quite high than llama3.

TadayukiOkada commented 6 months ago

have you tried jtop?: https://github.com/rbonghi/jetson_stats I see GPU activity with jtop when I run Ollama. I'm using jetpack 5.1.3 though. note llama3:latest is 8b's q4. so it's reasonable that llava:34b is slower. if you try llama3:70b, it's going to be much slower than llava:34b (you might not be able to run 70b on 32GB RAM Orin)

dusty-nv commented 6 months ago

@UserName-wang yes nvidia-smi isn't very supported on Jetson, as @TadayukiOkada suggested use jtop or tegrastats instead, and for optimized VLM see https://www.jetson-ai-lab.com/tutorial_nano-vlm.html

UserName-wang commented 6 months ago

thank you for your help! @dusty-nv @TadayukiOkada Jtop said GPU was using.

dusty-nv / jetson-containers

Not sure if ollama:r36.2.0 is using GPU #491