-
Powerful model trained on syntetic data, has high MMLU
4K context window one should be easier, as has no `LongRope`
https://huggingface.co/microsoft/Phi-3-mini-4k-instruct
https://arxiv.org/pdf…
-
### System Info
I met a trtllm-build issue.
GPU: RTX 3090
I followed official script of the below steps.
1. I ran the below code after installing nvidia container toolkit.
```
docker run -…
-
With how the project is currently configured, everyone needs to use python to convert models to ONNX, even when just wanting to experiment / try it out and using a different language. It'd be better i…
-
While testing phi-3 I have seen a very strange behaviour in MLX that is not present in ollama/llama.cpp.
During inference the first date is systematically wrong (any temperature including 0.0 and an…
-
For every model I've downloaded, the speed saturates my bandwidth (~13MB/sec) until it hits 98/99%. Then the download slows to a few tens of KB/s and takes hour(s) to finish.
I've tried multipl…
Pugio updated
1 month ago
-
### Your current environment
```text
The output of `python collect_env.py`
```
### 🐛 Describe the bug
`python -m vllm.entrypoints.openai.api_server --model microsoft/Phi-3-mini-4k-instruct --dt…
-
## 🐛 Bug
I tried to build this to run the 128k phi-3, to compare it with the pure CPU usage without OpenCL.then I encountered this error.
## To Reproduce
Steps to reproduce the behavior:
1…
-
I have tried this on Arch Linux with vendor 6.1.(unsure)\
And Armbian with vendor 6.1.43
mike@rock-5a:~/qwen-1_8B-rk3588$ rkllm ./qwen-chat-1_8B.rkllm
RKLLM starting, please wait...
rkllm-runti…
-
> ```shell
> pip's dependency
> ```
Hi @Luo-Z13,
- The error related to `pip's dependency` can be ignored.
- The error `TypeError: pad_sequence(): argument 'padding_value' (posi…
-
I downloaded the phi3-mini-128k-instruct-onnx model (cpu_and_mobile/cpu-int4-rtn-blocks-32) from hugging face, and used the phi3-qa.py to run text generation following the instructions in the [readme]…