-
### What happened?
Output like "Mh giàu され rodas reliablyacheteurδε Są" happens when using quantized K cache, CUDA, with Gemma 2. Here's how to reproduce:
./llama-server -m "Gemma-2-9B-It-SPPO-I…
-
Hey there! Super cool project. Thought I'd add some of the (yet to be documented) steps that I took to get the application working on my macbook pro with an M1 chip.
I did not use the docker image …
-
### System Info
Name: transformers
Version: 4.45.0.dev0
Name: trl
Version: 0.8.6
### Information
- [ ] The official example scripts
- [X] My own modified scripts
### Tasks
- [ ] A…
-
having the ability to use the api to paid services is cute and all.
can we have local only.
nobody wants to pay for these services anymore especially as llama3.1 blew them away with costly tie…
-
### Self Checks
- [X] This is only for bug report, if you would like to ask a question, please head to [Discussions](https://github.com/langgenius/dify/discussions/categories/general).
- [X] I have s…
-
Here in git it says that it uses gpt 4o, but when testing the tool, it is at 3.5 turbo. Do I need to configure something to use 4o? Thank you very much!
-
I am trying to run it on an Ubuntu system with local Ollama installed, but I'm facing three issues:
1. The code is unable to create//pull the Docker image.
2. It is using only the CPU (not the GPU…
-
I am using AutoModelForSequenceClassification for classifying a large model. Can I use this library, and how should I use it?
Additionally, if my output is only one token and I do batch inference, w…
-
### Prerequisites
- [X] I have read the [documentation](https://hf.co/docs/autotrain).
- [X] I have checked other issues for similar problems.
### Backend
Local
### Interface Used
CLI
### CLI Co…
-
Hello,
In running chat ui and trying some models, with phi3 and llama i had no problem but when I run gemma2 in vllm Im not able to make any good api request,
in env.local:
{
"name": "google/g…