-
Quick question: when is an update to an optimum-habana version which includes https://github.com/huggingface/optimum-habana/issues/1154 (fix for rope_scaling @ llama3.1 family) planned?
-
### Your current environment
```text
Collecting environment information...
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
…
-
### System Info
TRT-LLM 0.7.1
Host: g5.12xlarge ec2 instance(A10G)
Memory size: 23028MiB
CUDA: 12.2
Model: GPT2
### Who can help?
_No response_
### Information
- [X] The official example scri…
-
### Question Validation
- [X] I have searched both the documentation and discord for an answer.
### Question
I install the llamaindex with the command `pip install llama-index` and install t…
-
Hi team,
Optimum Neuron is looking into adding speculative decoding support for some seq2seq models. There seems to be an example from the Annapurna team but the link to the resource is missing. C…
-
### System Info
Transformers Version: 4.42.0
Python environment: 3.10.14
### Who can help?
@sanchit-gandhi
### Information
- [ ] The official example scripts
- [X] My own modified scripts
### …
-
### Your current environment
### Anything you want to discuss about vllm.
### Your current environment
The output of `python collect_env.py`
```text
PyTorch version: 2.4.0+cu121
Is debug…
-
### System Info / 系統信息
CUDA Version: 12.2
transformers Version: 4.44.2
Python: 3.12.4
Operating system: Windows Subsystem for Linux (WSL) in VS Code
### Who can help? / 谁可以帮助到您?
_No response_
#…
-
### Your current environment
The output of `python collect_env.py`
```text
Collecting environment information...
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTor…
-
### Misc discussion on performance
I've been running some simple tests on multi-node parallel pipeline with NCCL. I doubled the bandwidth between the nodes but saw no increase in t/s or throughput.…