-
There are some new models coming out which are being released in LoRa adapter form (such as [this one](https://huggingface.co/kaiokendev/SuperCOT-LoRA/tree/main)). Since there is no merge released, th…
-
Downloaded stuff from here for testing https://rentry.org/nur779
CPU goes up to 60% for like 15 seconds then it dies. If I don't load any char it works with the default settings, but it's really sc…
-
**Problem:**
I am aware everyone has different results, in my case I am running llama.cpp on a 4090 primary and a 3090 secondary, so both are quite capable cards for llms.
I am getting around 800% s…
-
## Feature Request
#### Is your feature request related to a problem? Please describe.
Presently, in order to know if package update exists one has to do ```scoop status```
But, scoop status …
-
Obsoletes #147, #150, https://github.com/ggerganov/llama.cpp/issues/1575, https://github.com/ggerganov/llama.cpp/issues/1590, https://github.com/rustformers/llm/discussions/143, and probably some othe…
-
WHY did you guys end support for older Llama models ? why is backwards compatibility not added when you change formats ? This is what pisses me off about open source, its absolute fraken chaos, things…
-
I use SSE-streaming endpoint (/api/extra/generate/stream) in my application. I notice that with every request the prompt is not handled completely, but only some small part of it. Although in the cons…
-
Sorry if this is vague. I'm not super technical but I managed to get everything installed and working (Sort of).
Anyway, when I entered the prompt "tell me a story" the response in the webUI was "O…
-
After noticing a big, visibly noticeable slowdown in the ooba text ui compared to llama.cpp, I wrote a test script to profile llama-cpp-python's high level API:
```
from llama_cpp import Llama
ll…
-
So back when project started, we had the first "unversioned" model format without the embedded tokens, with the magic 0x67676d6c (ggml).
Problem with that was that it didn't have any versioning sup…