-
TLDR: When offloading all layers to GPU, RAM usage is the same as if no layers were offloaded. In situations where VRAM is sufficient to load the model but RAM is not, a CUDA Out of Memory error ocurr…
-
I'm getting this error continuously. Which Parameters affect this?
![image](https://github.com/AIAnytime/Llama2-Medical-Chatbot/assets/128571697/25a986dc-8791-45d9-bc93-7d47d8b100c9)
-
I'm using uncensored model, the issue happened with uncensored-latest, uncensored 70b and any other uncensored model. Sometimes when I prompt the model, after it made a response, it will prompt itself…
-
### Describe the bug
I'm failing to run any GPTQ models even 7b. I can run HF no problem but then I have a problem with large files for anything over 7b. I have a tesla k80 and it only supports cuda …
-
Hi, I'm using Archlinux and ollama is installed locally on my machine. I installed ollama-telegram non-docker way but with running it in python venv. python version: Python 3.11.6
I've set the Enviro…
-
I've tried running in CLI and via network request, but getting reply in chinese mandarin.
Has someone faced this?
-
Im using it with a local llama uncensored model by adding
```
openai.api_base = "URL"
```
Under the `import openai` calls in the code and its working great! Thank you for making this!
-
1.Hello,llama.bin should stored in same folder with hackbot.py but cant find it anywhere... where it can be? ) But it work. Thanks a lot for your work.
2. Can it work with other datasets, models …
-
ollama isn't reponding to
```
curl http://localhost:11434/api/show --json '{"name": "codellama:7b-instruct"}'
404 page not found
```
and I didn't configure ollama to start on a particular port, …
-
The latest commit in main suddenly stopped working with GPTQ models on my 2 Nvidia GTX 1080s:
```
(localGPT-Bandera) [hedgar@hedgaron-prime localGPT]$ python run_localGPT.py --device_type cuda
2023…