-
### Your current environment
The output of `python collect_env.py`
```text
Collecting environment information...
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTor…
-
Running model forwards within a process seems to get stuck. I tried to set `TOKENIZERS_PARALLELISM` to `true` and `false` but unfortunately both couldn't help 🥲
### System Info
`transformers-cli…
-
It seems that there is a bug when using the `use_mem_eff_path` feature, when `ngroups` is greater than 1. The loss curve initially decreases but then stabilizes around a constant value and fails to co…
-
### What happened + What you expected to happen
```python
@pytest.mark.parametrize("ray_start_cluster_head_with_env_vars", [
{
"include_dashboard": True,
"env_va…
-
The example should show tensor parallelism. I am not sure if Serve + vLLM + tensor parallelism works at the moment because the Serve deployment will request N GPUs, then each vLLM worker will request …
-
### Proposal to improve performance
_No response_
### Report of performance regression
_No response_
### Misc discussion on performance
Hi,
Thank you for your contribution to the LLM community…
-
## ❓ Question
## What you have already tried
## Environment
> Build information about Torch-TensorRT can be found by turning on debug messages
- PyTorch Version (e.g., 1.0):
- C…
-
### Your current environment
```text
Collecting environment information...
PyTorch version: 2.2.1+cu118
Is debug build: False
CUDA used to build PyTorch: 11.8
ROCM used to build PyTorch: N/A
…
-
I didn't see any documentation that mentions that.
-
In Megatron, I find that the check for `tp_comm_overlap` and `sequence_parallel`。
```
if args.tp_comm_overlap:
assert args.sequence_parallel == True, 'Tensor parallel communicatio…