-
### 🐛 Describe the bug
Creating an issue for https://github.com/csarofeen/pytorch/pull/1930#issuecomment-1232175340
```
PYTORCH_NVFUSER_ENABLE="transpose_scheduler" ./build/bin/nvfuser_bench "-…
-
After enabling aten2aten decomps:
===== dlrm_backward_0 ======
Generating testing data...
aten2aten decomp: aten.detach.default
aten2aten decomp: aten.detach.default
aten2aten decomp: aten.deta…
-
Current reduction scheduler limites types of epilogue pointwise ops can be fused through `SchedulerTopologyChecker`.
It needs further works in the following areas:
(1) the tests are missing
(2) th…
-
## 🐛 Bug
When benchmarking model: 'Mixtral-8x7B-v0.1' we get OOM errors even with --checkpoint_activations True
The same configurations works for torch.compile.
Might be related to [https://gi…
-
Noticed a performance regression of layer norm backward from July 19 to July 26 (no CI data from 22, 23, 24, 25) on H100 with hidden size around 15K to 16K. See [dashboard](http://nv/eh3). SOL droped …
-
From the CI run in an intermediate version of #691 :
FAILED thunder/tests/test_grad.py::test_vjp_correctness_sdpa_manual_grad_forward_scaled_dot_product_attention_nvfuser_cuda_thunder.dtypes.float1…
t-vi updated
3 months ago
-
The following code results in a segfault as of yesterday (e.g. commit 1a5db862df21e5dabaeb0f3648a012ea60cee8c3)
```python
import torch
import torch.nn.functional as F
import nvfuser
def test_em…
-
# Repro
This is a follow-up to https://github.com/NVIDIA/Fuser/pull/1649#discussion_r1535920468.
I created a simple repro in wjy/input, which you can run by
```
$ git fetch origin wjy/input
…
-
### 🐛 Describe the bug
In the TorchDynamo+AOT_Autograd+Primtorch stack, Dropout is currently implemented as a decomposition when traced by AOT_Autograd. The decomposition calls `rand_like` to provid…
-
I am getting this strange error, and despite that RFdifusion actually seem to run fine. I still get output structures that looks correct. Is that something I should be concerned over?
```
/usr/…