-
Hi, I am trying to implement the speculative decoding from [Accelerating Large Language Model Decoding with Speculative Sampling](https://arxiv.org/abs/2302.01318), and below is the code snippet:
`…
-
公私共に多忙でろくに更新できていなかった...(すみません
-
### Your current environment
My environment setup involving two 8xH100 nodes is detailed in https://github.com/vllm-project/vllm/issues/6775; therefore, I will omit it here for brevity.
### 🐛 De…
-
### 是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
- [X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions
### 该问题是否在FAQ中有解答? | Is there an existing ans…
-
### Your current environment
LLM : v0.5.4
```
llm = LLM(model= "unsloth/Qwen2-7B-Instruct-bnb-4bit" , dtype='bfloat16',
gpu_memory_utilization=0.95, quantization="bitsandbytes", load_for…
-
```
http://static.electroteque.org.s3.amazonaws.com/download/apple-osmf.zip
Here is the refactored code as a library now with a working example of the m3u8
parsing and multi bitrate setup. I'm not s…
-
```
http://static.electroteque.org.s3.amazonaws.com/download/apple-osmf.zip
Here is the refactored code as a library now with a working example of the m3u8
parsing and multi bitrate setup. I'm not s…
-
### Your current environment
The output of `python collect_env.py`
```text
Collecting environment information...
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTor…
-
Converting the Qwen2 Model Using the python-c'import xfastertransformer as xft;xft.Qwen2Convert().convert("/tmp/models/Qwen2-1.5B-Instruct", "/tmp/xf_models/qweb2_1.5b_xf")' Command
An error is rep…
-
### What happened?
Hello, llama.cpp experts! Thank you for creating such an amazing LLM Inference system. 😁
**However, while using this system, I encountered an unusual results when checking the spe…