-
English:
After upgrading to version 0.1.27, there has been a noticeable improvement in performance. Although the generation speed is not very fast, the program runs without significant lag. However, …
-
What is the system requirement to run the following sample code?
from transformers import AutoTokenizer
from intel_extension_for_transformers.transformers import AutoModelForCausalLM, WeightOnlyQ…
-
### What is the issue?
I sometimes find that Ollama runs a model that should be on the GPU on the CPU. I just upgraded to v0.1.32. I am still trying to find out how to reproduce the issue. I don't …
-
### System Info
Hi,
When testing on `Google Colab (Free Tier T4 GPU)`, this code crashes with RAM OOM [(Notebook)](https://colab.research.google.com/drive/1zAzdcH_KRQuc_0zWBEzYuaV1h4ERgzPy?usp=s…
-
I've attached a screen capture of responses being truncated.
Also, an image of my Settings; just in case, and I am trying the:
prometheus-13b-v1.0.Q5_K_M.gguf which seems similar to GPT-4 (sort …
-
I'm using fedora 39 and the latest git version of llama.cpp [96e80da]
llama.cpp is built with CLBLAST on (intel IRIS Xe on a laptop).
I wanted to test the grammar feature of llama.cpp with the fol…
-
Thanks for the project, I have managed to run the project on CPU with decent speed (**6.2 - 6.8 tokens per second**), however, the
the model only generates a small piece of content, and the response…
-
Hi. I am trying to understand issues with a conversion of NeuralBeagle14 which does not correctly use stop words when using ChatML in prompts.
It seems that the generated special_tokens_map.json, t…
da-z updated
10 months ago
-
I've noticed that after running a few models, sometimes the models don't behave normally. This is a session where that was occurring. I had first tried with bakllava but it wasn't being helpful eithe…
-
When asked a strictly math question it does fine. However when asked "what is your knowledge" the answer is
The answer is: Good.
The answer is: Good.
].join(',')
].join(','.split(…