-
### What is the issue?
Ollama is failing to run on GPU instead it uses CPU. If I force it using `HSA_OVERRIDE_GFX_VERSION=9.0.0` then I get `Error: llama runner process has terminated: signal: abo…
-
With many claiming that phi3 mini is uncannily good for it's size, and with larger, actually-useful phi3 models on the way, adding support for this arch is almost certainly worthwhile.
-
### What is the issue?
We are setting OLLAMA_MAX_LOADED_MODELS=4 in our systemd override file for the ollama service:
![image](https://github.com/ollama/ollama/assets/48829375/b09c1dda-a196-4b89-b34…
-
Is it possible to create a memgpt feature and make it available to all the agents rather than having a separate agent like it's discussed in #530?
-
![error](https://github.com/user-attachments/assets/c6a351db-0074-4db7-bc68-9b6eb9f3081f)
After running the app.py file and putting the model in the web_app_storage/models folder. I get the this er…
-
https://python.langchain.com/en/latest/use_cases/question_answering/semantic-search-over-chat.html
https://github.com/hwchase17/langchain/blob/master/docs/use_cases/question_answering/semantic-sear…
-
how to use onnx model for Phi-3 mini 128k for faster inference for local machine having cpu only. Can you provide the code to do it.
-
## Description:
Selecting one of the following model for the final response generation.
- Phi-3-mini (4k and 128k)
- Llama3-8B (8k)
- google/gemma-7b
## Criteria
- context length
- response time
…
-
I downloaded the phi3-mini-128k-instruct-onnx model (cpu_and_mobile/cpu-int4-rtn-blocks-32) from hugging face, and used the phi3-qa.py to run text generation following the instructions in the [readme]…
-
I've been testing out phi3-128k, but am running into issues using larger context windows (>4000)
With `cuda-fp16`, anything larger than 4096 gives me a memory allocation error, which is surprising …