-
## Describe the bug
Have a look :-)
https://github.com/user-attachments/assets/321dbb21-2403-4330-9ce1-091902298888
## Latest commit or version
0.22
MBP M3 Max
-
https://huggingface.co/smallcloudai/Refact-1_6B-fim - via https://news.ycombinator.com/item?id=37381862
-
Using the API server and submitting multiple prompts to take advantage of speed benefit returns the following error:
"multiple prompts in a batch is not currently supported"
What's the point of …
jpeig updated
1 month ago
-
### Issues Policy acknowledgement
- [X] I have read and agree to submit bug reports in accordance with the [issues policy](https://www.github.com/mlflow/mlflow/blob/master/ISSUE_POLICY.md)
### Where…
-
- [ ] [codefuse-chatbot/README_en.md at main · codefuse-ai/codefuse-chatbot](https://github.com/codefuse-ai/codefuse-chatbot/blob/main/README_en.md?plain=1)
# codefuse-chatbot/README_en.md at main ·…
-
### Your current environment
Docker latest 0.5.4
```
docker pull vllm/vllm-openai:latest
docker run -d --restart=always \
--runtime=nvidia \
--gpus '"device=0"' \
--shm-size=10.…
-
### What happened + What you expected to happen
I am trying to load a quantized large model with vLLM. It is able to start the model loading, but it sometimes will stop loading the model and return…
-
I have fine-tuned the Qwen2-vl 7B model, and I am trying to perform inference but I can't figure out how to do it. The inference command used during fine-tuning is as follows:
```
NFRAMES=24 MAX_PIX…
-
### Feature request
I have download the model, so I want to run it use local model, eht sample is:
docker run --gpus all --shm-size 1g -p 8080:80 -v /data/model/:/data/ \
ghcr.io/predibase/lora…
-
**Is your feature request related to a problem? Please describe.**
A nice property of the `json.RawMessage` design is that it's fairly trivial to safely inspect the broad kind of JSON data with:
…