NVIDIA-AI-IOT / jetson-copilot

A reference application for a local AI assistant with LLM and RAG
Apache License 2.0
89 stars 14 forks source link

Unable to download the llama3.1 model #14

Open jedld opened 1 month ago

jedld commented 1 month ago

When attempting to download "llama3.1" via the download new model UI, I'm getting:

It looks like "llama3.1" is not the right name.

This error does not happen for the other llama models.

brkstyl commented 1 month ago

I am also having a similar problem, I am trying to load any additional model from Gemma to Starcoder, and none of the ollama models library is loading, and gives me the same error

andrebaumgartfht commented 1 month ago

We are having the same issue. Is there any specific versoin used? We are using Jetpack 6.1 [L4T 36.4.0]. Thanks.

dusty-nv commented 1 month ago

Hi @andrebaumgartfht can you try dustynv/jetson-copilot:r36.4.0 ?

andrebaumgartfht commented 1 month ago

Thank you for the build.

Did the following: Created documents in folder in my jetson-container repo locally (mkdir -p ./data/documents/jetson) and added a pdf file to folder as an empty folder will cause another exception (potentially a defaulted RAG only container).

Then started the container using jetson-containers run dustynv/jetson-copilot:r36.4.0 bash -c '/start_ollama && streamlit run app.py.

Loading and indexing the Jetson docs startet ...

Then raised the following exception:

Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/streamlit/runtime/scriptrunner/exec_code.py", line 88, in exec_func_with_error_handling result = func() File "/usr/local/lib/python3.10/dist-packages/streamlit/runtime/scriptrunner/script_runner.py", line 579, in code_to_exec exec(code, module.dict) File "/opt/jetson-copilot/app.py", line 55, in index = load_data() File "/usr/local/lib/python3.10/dist-packages/streamlit/runtime/caching/cache_utils.py", line 212, in call return self._get_or_create_cached_value(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/streamlit/runtime/caching/cache_utils.py", line 235, in _get_or_create_cached_value return self._handle_cache_miss(cache, value_key, func_args, func_kwargs) File "/usr/local/lib/python3.10/dist-packages/streamlit/runtime/caching/cache_utils.py", line 292, in _handle_cache_miss computed_value = self._info.func(func_args, func_kwargs) File "/opt/jetson-copilot/app.py", line 52, in load_data index = VectorStoreIndex.from_documents(docs) File "/usr/local/lib/python3.10/dist-packages/llama_index/core/indices/base.py", line 119, in from_documents return cls( File "/usr/local/lib/python3.10/dist-packages/llama_index/core/indices/vector_store/base.py", line 76, in init super().init( File "/usr/local/lib/python3.10/dist-packages/llama_index/core/indices/base.py", line 77, in init index_struct = self.build_index_from_nodes( File "/usr/local/lib/python3.10/dist-packages/llama_index/core/indices/vector_store/base.py", line 310, in build_index_from_nodes return self._build_index_from_nodes(content_nodes, insert_kwargs) File "/usr/local/lib/python3.10/dist-packages/llama_index/core/indices/vector_store/base.py", line 279, in _build_index_from_nodes self._add_nodes_to_index( File "/usr/local/lib/python3.10/dist-packages/llama_index/core/indices/vector_store/base.py", line 232, in _add_nodes_to_index nodes_batch = self._get_node_with_embedding(nodes_batch, show_progress) File "/usr/local/lib/python3.10/dist-packages/llama_index/core/indices/vector_store/base.py", line 139, in _get_node_with_embedding id_to_embed_map = embed_nodes( File "/usr/local/lib/python3.10/dist-packages/llama_index/core/indices/utils.py", line 138, in embed_nodes new_embeddings = embed_model.get_text_embedding_batch( File "/usr/local/lib/python3.10/dist-packages/llama_index/core/instrumentation/dispatcher.py", line 307, in wrapper result = func(args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/llama_index/core/base/embeddings/base.py", line 335, in get_text_embedding_batch embeddings = self._get_text_embeddings(cur_batch) File "/usr/local/lib/python3.10/dist-packages/llama_index/embeddings/ollama/base.py", line 75, in _get_text_embeddings embeddings = self.get_general_text_embedding(text) File "/usr/local/lib/python3.10/dist-packages/llama_index/embeddings/ollama/base.py", line 88, in get_general_text_embedding result = self._client.embeddings( File "/usr/local/lib/python3.10/dist-packages/ollama/_client.py", line 281, in embeddings return self._request( File "/usr/local/lib/python3.10/dist-packages/ollama/_client.py", line 75, in _request raise ResponseError(e.response.text, e.response.status_code) from None ollama._types.ResponseError: llama runner process has terminated: signal: aborted (core dumped) CUDA error: CUBLAS_STATUS_INTERNAL_ERROR current device: 0, in function ggml_cuda_op_mul_mat_cublas at /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml-cuda.cu:1240 cublasGemmEx(ctx.cublas_handle(id), CUBLAS_OP_T, CUBLAS_OP_N, row_diff, src1_ncols, ne10, &alpha_f16, src0_ptr, CUDA_R_16F, ne00, src1_ptr, CUDA_R_16F, ne10, &beta_f16, dst_f16.get(), CUDA_R_16F, ldc, CUBLAS_COMPUTE_16F, CUBLAS_GEMM_DEFAULT_TENSOR_OP) GGML_ASSERT: /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml-cuda.cu:100: !"CUDA error"