-
### 🐛 Describe the bug
Running the following test:
```
python -m unittest test_jit_cuda_fuser.TestCudaFuser.test_native_layer_norm_bfloat -v
```
Results in the error:
```
==============…
-
## 🐛 Bug
Conversation from Horace:
```
hmmm... I think it has something to do with the previous sort error I mentioned.
Horace He 3 days ago
I'm writing a test minimizer, and that error st…
-
## 🐛 Bug
JITed model runs ~ x14 slower for its first ~ 20 evaluations in pytorch nightly, with respect to stable torch, 1.7.1
## To Reproduce
Run the following code in pytorch:
```
import…
-
### 🐛 Expand indexing error causes incorrect results
It seems we are indexing into the expanded tensor incorrectly. The repro works correctly when the expand is dropped and we treat the size-1 axis…
-
### 🚀 The feature, motivation and pitch
Not easily knowing when something is fused has bitten the e3nn team quite a bit I think. Turns out control flow prevented all their [ridiculously fuseable co…
-
### 🐛 Describe the bug
PR #81785 seems to have increased start up overhead for timm models significantly and has timed out our internal CI.
Overall throughput is about the same, but end-2-end test…
-
If compute at maps don't get the full relationships of thread bindings, they don't index correctly because that information isn't otherwise available in indexing math. For example if we have `blockIdx…
-
Vector load from SMEM seems to have a problem.
```
TEST_F(NVFuserTest, FusionSmemVectorize_CUDA) {
Fusion fusion;
FusionGuard fg(&fusion);
auto tv0 = makeContigTensor(1);
fusion.addI…
-
## 🚀 Feature
Forwarding feature request from Horace@functorch
```
def f(x):
x = x * torch.tensor(1.0)
x = x * torch.tensor(1.0)
return x
```
Runs fine in eager/TS, but NVFuser fail…
-
This seems to be wrong:
https://github.com/csarofeen/pytorch/blob/devel/torch/csrc/jit/codegen/cuda/runtime/warp.cu#L47
```
if (read_write_pred && is_warp_head) {
shared_mem[smem_offset…