-
# 🐛 Bug
from vllm import LLM, SamplingParams
llm = LLM(model=model_dir,enforce_eager=True)
then
```
File d:\my\env\python3.10.10\lib\site-packages\xformers\ops\fmha\_triton\splitk_kernels.…
-
Hi! Thanks for your amazing work. I tested several pipelines and the speed of this framework is truly impressive🔥
However, I have encountered an issue when using the stable-fast setting with the `e…
-
### 🐛 Describe the bug
After quantizating the resnet50_clip.openai model with torch.ao quantization, last step `exir.to_edge()` fails quite often, not only with this model, but with many others:
`…
-
Hi. I just wondering why I choose cuda graph as false, then I compared without stable_fast node.. then the last one is faster??
this is cuda graph as false:
![image](https://github.com/gameltb/…
-
Once https://github.com/JuliaGPU/CUDA.jl/pull/1380 is merged, we should be able to have first-class support for the `graph_type=:sparse`. Therefore PR #66 should be revamped with latest cuda updates.
…
-
*Note*: If you have a model or program that is not supported yet but should be, please use the program coverage template.
## 🐛 Bug
```py
import thunder
import torch
def func(x):
return…
-
### Describe the bug
The Graph/RecordReplay/usm_fill.cpp test has been observed to timeout in CUDA CI for unrelated changes. For example, see https://github.com/intel/llvm/pull/14985.
```
TIMEOUT…
-
### Describe the issue
I'm using onnx-tensorrt.
When I enable the trt_cuda_graph_enable like this:
![image](https://github.com/microsoft/onnxruntime/assets/67405690/0f239de5-f995-43df-aa8a-805674…
-
## 🐛 Bug
![胡言乱语](https://github.com/user-attachments/assets/4f446294-a903-412d-ad98-987d0f04a60a)
## To Reproduce
Steps to reproduce the behavior:
1. 编译
mlc_llm compile /path/to/internl…
-
PyTorch now has some support for representing varlen sequences. It is supported to some extent by HF:
- https://medium.com/pytorch/bettertransformer-out-of-the-box-performance-for-huggingface-transfor…