-
### Issue
There was a recent thread/blogpost about cursor.sh's recent 'fast apply' changes:
- https://x.com/amanrsanger/status/1790947733899203027
- https://cursor.sh/blog/instant-apply
- > Ou…
-
### Your current environment
PyTorch version: 2.2.1+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.4 LTS (x86_64)
GCC version: (U…
-
### System Info
TensorRT-LLM:v0.9.0
tensorrtllm_backend:v0.9.0
### Who can help?
@kaiyux
### Information
- [ ] The official example scripts
- [ ] My own modified scripts
### Tasks…
-
I have some questions about the structure of custom mask for lookahead and verify branches [as described in the blog](https://lmsys.org/blog/2023-11-21-lookahead-decoding/#lookahead-and-verify-in-the…
-
### Your current environment
why is it important:
This is a prerequisite to the work on enabling troch.compile on vllm, we need to be able to build vllm with nightly so that we can iterate on chan…
-
### System Info
transformers==4.39.1
python==3.8.17
torch==2.0.1+cpu
### Who can help?
@sanchit-gandhi
### Information
- [ ] The official example scripts
- [ ] My own modified scr…
-
Hi FlexFlow team,
I used the methods mentioned in #1099 to test the latency(GPU: RTX-4090), but i get a confused result:
1)LLaMA-7B + 1个SSM(llama-160M), latency: 25.1 s
2)LLaMA-7B(without ssms), la…
-
### System Info
Python 3.10.11
transformers 4.40.0
torch 2.0.1
Linux version 4.15.0-55-generic x86_64
### Who can help?
@ArthurZucker @gante
### Information
- [ ] The official example scripts
…
-
### Your current environment
Running vllm openai docker container on a single A5000 GPU on Runpod.
Initialisation settings:
`--host 0.0.0.0 --model microsoft/Phi-3-small-8k-instruct --tensor-pa…
-
To the best of my knowledge, speculative decoding does not change the decoding result when using greedy decoding. However, I noticed that the rouge2 metrics of 'base' and 'essg' may be different in th…