dusty-nv / jetson-containers

Machine Learning Containers for NVIDIA Jetson and JetPack-L4T
MIT License
1.89k stars 416 forks source link

ollama failed to run #493

Closed Aspirinkb closed 2 months ago

Aspirinkb commented 2 months ago

run dustynv/ollama:r36.2.0 container on AGX Orin 64GB, os version Jetpack 6.0 DP [L4T 36.2.0], ollama can not run LLM:

root@ubuntu:/# ollama run phi3
pulling manifest
pulling 4fed7364ee3e... 100% ▕████████████████████████████████████████████████████████████████▏ 2.3 GB
pulling c608dc615584... 100% ▕████████████████████████████████████████████████████████████████▏  149 B
pulling fa8235e5b48f... 100% ▕████████████████████████████████████████████████████████████████▏ 1.1 KB
pulling d47ab88b61ba... 100% ▕████████████████████████████████████████████████████████████████▏  140 B
pulling f7eda1da5a81... 100% ▕████████████████████████████████████████████████████████████████▏  485 B
verifying sha256 digest
writing manifest
removing any unused layers
success
Error: llama runner process no longer running: -1 CUDA error: CUBLAS_STATUS_EXECUTION_FAILED
  current device: 0, in function ggml_cuda_mul_mat_batched_cublas at /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml-cuda.cu:1848
  cublasGemmBatchedEx(ctx.cublas_handle(), CUBLAS_OP_T, CUBLAS_OP_N, ne01, ne11, ne10, alpha, (const void **) (ptrs_src.get() + 0*ne23), CUDA_R_16F, nb01/nb00, (const void **) (ptrs_src.get() + 1*ne23), CUDA_R_16F, nb11/nb10, beta, ( void **) (ptrs_dst.get() + 0*ne23), cu_data_type, ne01, ne23, cu_compute_type, CUBLAS_GEMM_DEFAULT_TENSOR_OP)
GGML_ASSERT: /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml-cuda.cu:60: !"CUDA error"
Aspirinkb commented 2 months ago

Find the same situation on Ollama repo issue

Aspirinkb commented 2 months ago

I have resolved the issue because I made a silly mistake. In simple terms, I had already started the Ollama service in the system before launching the container. So when I executed ollama run phi3 inside the container, it was actually being processed by the Ollama service outside the container, not the one inside.

Therefore, when I shut down the Ollama service outside the container, started it inside the container, and tried running the model again, it worked successfully.