-
English:
After upgrading to version 0.1.27, there has been a noticeable improvement in performance. Although the generation speed is not very fast, the program runs without significant lag. However, …
-
## 🐛 Bug
Has anyone encountered the situation where using Qwen1.5-4B-Chat and [Qwen1.5-1.8B-Chat ]()in mlc-llm, when click the chat entrance in, the model can be loaded normally, but after starting…
-
Is it possible to use weights on HuggingingFace, e.g https://huggingface.co/google/gemma-2b-it ?
BTW - does it work on Rocm(AMD)?
-
Any invocation of python -m sillm.chat model seems much slower on my machine than in the reference video--more than a minute to get to the prompt, and maybe 1-2 TPM in the response.
I have tried si…
-
### What happened?
After installing and logging into the Quivr on Amazon Linux with default userid/password, **not able to create the first brain**. Request your help as we are stuck with this iss…
-
It seems like the code is forced to run on CPU (sending my computer out of ram). If I output torch gpu available is says true, and it's using GPU, but the model still loads on CPU ram. Looking into th…
-
When running the below example, I get 'Parser reached state with no allowed tokens'. I believe this is due to one example within the batch finishing and subsequently having pad tokens be generated for…
-
### Is there an existing issue for the same bug?
- [X] I have checked the troubleshooting document at https://github.com/OpenDevin/OpenDevin/blob/main/docs/guides/Troubleshooting.md
- [X] I have chec…
-
**Describe the bug**
Hi, I saved a checkpoint and want to convert it to safetensors format. It was successful for Phi-2 model. But when I try Gemma model, the error raised. Below is my code. The chec…
-
Hi authors!
With the recent AQLM integration in transformers, would it makes sense to quantize the Google gemma models in 2-bit
The list of the models can be found here: https://huggingface.co…