-
### How would you like to use vllm
I want to run Phi-3-vision with VLLM to support parallel calls with high throughput. In my setup (openai compatible 0.5.4 VLLM server on HuggingFace Inference End…
-
**Describe the bug**
python3 phi3v.py -m cuda-int4-rtn-block-32 gives the following issue:
`Loading model...
Traceback (most recent call last):
File "phi3v.py", line 66, in
run(args)
…
-
### Prerequisites
- [X] I am running the latest code. Mention the version if possible as well.
- [X] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md)…
-
https://huggingface.co/microsoft/Phi-3-vision-128k-instruct
-
Using the following local Model pulled from HF here is my code:
```
var modelPath =
@"C:\models\Phi-3-vision-128k-instruct-onnx-cpu\cpu-int4-rtn-block-32-acc-level-4";
#p…
-
## ⚙️ Request New Models
- Link to an existing implementation (e.g. Hugging Face/Github): https://huggingface.co/microsoft/Phi-3-vision-128k-instruct
- Is this model architecture supported by MLC…
-
### This issue is for a: (mark with an `x`)
```
- [ ] bug report -> please search issues before submitting
- [x] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior …
-
Hello and thank you for the great work here!
We are trying to save a Phi 3 vision mode, but are running into some issues saving it as safetensors.
Due to a shared weight, saving unfortunately fa…
-
### Your current environment
vllm == 0.5.5.
### 🐛 Describe the bug
when we deploy the `microsoft/Phi-3.5-vision-instruct`,
it will randomly hit this issue.
```
[1;36m(VllmWorkerProcess p…
-
### Checklist
- [X] 1. I have searched related issues but cannot get the expected help.
- [X] 2. The bug has not been fixed in the latest version.
- [X] 3. Please note that if the bug-related issue y…