-
Platforms: rocm, linux
This test was disabled because it is failing in CI. See [recent examples](https://hud.pytorch.org/flakytest?name=test_cudagraphs_cpu_scalar_used_in_cpp_custom_op&suite=TestCo…
-
## 🐛 Bug
I want to increase the batch size of my model but find the memory easily filled. However when I look at the numbers of the memory, it's not consistent between memory_summary and nvidia-smi…
-
Yesterday I started a training with DistributeFilesDataset and file caching which today crashed and consistently crashes after restarting with what I think is `OSError: AF_UNIX path too long` in the `…
-
Platforms: linux
This test was disabled because it is failing in CI. See [recent examples](https://hud.pytorch.org/flakytest?name=test_retrace_export_while_loop_simple_cpu_float32&suite=TestHOPCPU&li…
-
### Bug description
Suppose I have a `LightningModule` (parent) that contains a `nn.Module` (child), which in turn contains another `LightningModule` (grandchild). Calling `.log` inside the `Lightnin…
-
Platforms: linux
This test was disabled because it is failing in CI. See [recent examples](https://hud.pytorch.org/flakytest?name=test_sdp_math_gradcheck_contiguous_inputs_False_cuda&suite=TestSDPACU…
-
Thanks for the help of issue #126 .
I got a question when I tried to run reference on cpu **_without CUDA_**. The reproduce steps are as follows,
Requirements
```
pip install triton==2.2 (Req…
-
### System Info
Transformers 4.41.2
PyTorch 2.3.1+cu121
Python 3.12.3
Ubuntu 24.04
GPU: NVIDIA GeForce GTX 1650
### Who can help?
_No response_
### Information
- [ ] The official …
-
According to [this table](https://huggingface.co/docs/transformers/main/en/quantization/overview#when-to-use-what), `quanto` supposedly works with `torch.compile`. However, I get dynamo errors when tr…
-
### 复现方式
`xtuner train llava_internlm2_chat_20b_clip_vit_large_p14_336_e1_gpu8_pretrain.py`
### 配置文件
仅改动数据集及模型位置
### 运行日志
```
Map (num_proc=32): 100%|████████████████████████████████████████…