-
## š Bug
With newest version of Docker image (tested on 2024-04-28 ) training with thunder.jit on 8xA100 it's not possible to run Platypus-30B and vicuna-33b-v1.3 models. This is the error:
> ā¦
-
### š Describe the bug
NVFuser is hitting a fallback when running demucs on torchbench + lazy tensor.
Repro:
```python
import torch
from torch.utils.jit.log_extract import run_nvfuser, load_graā¦
-
### š The feature, motivation and pitch
In particular, `hf_DistilBert_forward_0` has ~30% regression compared to eager.
Status document (for those who have access): https://docs.google.com/documā¦
-
per title, this has happened on V100 and A100 a few times before
```python
00:13:04 =================================== FAILURES ===================================
00:13:04 _____________________ā¦
-
Fp max reductions would typically look like:
```
for(nvfuser_index_t i154 = 0; i154 < 8; ++i154) {
int i299;
i299 = 4 * i154;
#pragma unroll
for(nvfuser_index_t i156 = 0; i156 ā¦
-
I'm separating this from #1502. While we can get rid of `cat` in some cases, improving nvFuser's codegen for `slice` and `cat` will still benefit the RoPE module and the QKV split around it.
httpsā¦
-
## š Bug
Repro script from @nikitaved
```
import itertools
import torch
from torch.utils.benchmark import Timer, Compare
import thunder
#dtypes = [torch.bfloat16, torch.float, torch.doubleā¦
-
This test fails to compile:
```python
def test_simple_slice_fusion_bfloat16(self):
inputs = [torch.randn((10,), dtype=torch.bfloat16, device="cuda:0")]
def fusion_func(fd: Fuā¦
-
### š Describe the bug
Per the [setup.py](https://github.com/pytorch/pytorch/blob/main/setup.py#L171) and past experience from torch 1.13, setting the `CUDNN_LIB_DIR` environment variable should poā¦
-
This test started failing a couple of days ago.
python_tests.test_normalization.test_instance_norm_multigpu
See also https://gitlab-master.nvidia.com/dl/pytorch/update-scripts/-/issues/50225