-
### š Describe the bug
I keep seeing comparisons between JAX / TF / Keras vs `torch.compile` where they benchmark the "default XLA settings vs. the default `torch.compile`" to find that XLA frontendsā¦
-
### Your current environment
The output of `python collect_env.py`
```text
root@newllm201:/workspace# vim collect.py
root@newllm201:/workspace# python3 collect.py
Collecting environment infoā¦
-
@jerryzh168 I think this could be beneficial to be able to load a quantized and compiled model and proceed straight to inference.
However, I am not sure what functions to use to make this happen. ā¦
-
### Checklist
- [X] 1. I have searched related issues but cannot get the expected help.
- [X] 2. The bug has not been fixed in the latest version.
- [X] 3. Please note that if the bug-related issue yā¦
-
While using torch_cluster knn_graph function I get an empty edge_index depending on the order of the gpus used.
gpus != 0 will work only after gpu 0 has been used.
torch.__version__ '2.4.0+cu121ā¦
-
Does DeepSpeed support Pytorch code with [CUDA Graphs](https://pytorch.org/blog/accelerating-pytorch-with-cuda-graphs/)? If not, do think it may be helpful to DeepSpeed users for further speedups?
ā¦
-
Hi! Thanks for your amazing work. I tested several pipelines and the speed of this framework is truly impressiveš„
However, I have encountered an issue when using the stable-fast setting with the `eā¦
-
### š Describe the bug
The current Dynamo captures torch.cuda.stream with https://github.com/pytorch/pytorch/pull/93808. However, for other backends with streams, the capture wouldn't happen. There sā¦
-
## Description
I get ICudaEngine object after model building with TensorRT. And I get output dimension of model by ICudaEngine.getTensorShape.
The output dimension contains -1. Then sometimes there ā¦
-
### vllm latest
I add some logger in /vllm/model_executor/models/llama.py ,I want to print the attention ,like that
if I start llm server,the error is
[rank0]: During handling of the above eā¦