-
### What is the issue?
We are setting OLLAMA_MAX_LOADED_MODELS=4 in our systemd override file for the ollama service:
![image](https://github.com/ollama/ollama/assets/48829375/b09c1dda-a196-4b89-b34…
-
**Is your feature request related to a problem? Please describe.**
My company unfortunately cannot benefit from the AI power-tools because anything that involves code remote upload will be violatin…
-
how to use onnx model for Phi-3 mini 128k for faster inference for local machine having cpu only. Can you provide the code to do it.
-
## Description:
Selecting one of the following model for the final response generation.
- Phi-3-mini (4k and 128k)
- Llama3-8B (8k)
- google/gemma-7b
## Criteria
- context length
- response time
…
-
I downloaded the phi3-mini-128k-instruct-onnx model (cpu_and_mobile/cpu-int4-rtn-blocks-32) from hugging face, and used the phi3-qa.py to run text generation following the instructions in the [readme]…
-
Dubious! I'll start my explanation with this deliberately provocative adjective to make progress on the subject and find my mistake.
On the web you can see a craze for finetuning (with unsloth or o…
-
**Describe the bug**
If two requests are sent to the server at roughly the same time, it will start to respond to both requests and then crash with the following error message:
```
ERROR mistral…
-
https://python.langchain.com/en/latest/use_cases/question_answering/semantic-search-over-chat.html
https://github.com/hwchase17/langchain/blob/master/docs/use_cases/question_answering/semantic-sear…
-
-
### Your current environment
docker latest for 0.5.3
```
docker pull vllm/vllm-openai:latest
docker run -d --restart=always \
--runtime=nvidia \
--gpus '"device=1"' \
--shm-size…