-
```
In [2]: import torch
In [3]: def fn(x):
...: x=x.amax(dim=0)*.1
...: return x
...:
...: a=torch.tensor([-float('inf'), -float('inf'), -float('inf'), -float('inf')], de…
-
-
### 🐛 Describe the bug
The bias+gelu backward kernel which is really gelu_backward + an outer reduction for the bias is failing with the FP16 Data Type. This fusion is found in the backward pass of …
-
We implement the support of T5 models.
In this first step it's only kernel replacement, no other optimizations.
- Support T5 masks #66
- Support T5 kernels patterns #63
- Support T5 missing act…
-
Some of the shift/gather tests are failing non-deterministically due to missing syncthreads.
For example, in `FusionMaxPoolingStrided`, there must be syncthreads after loading to `T3`, but that's m…
-
(and possibly other operations)
```
def fn(x):
x=x.clamp(min=1.)*.1
return x
a=torch.tensor([1.,float('inf'), 2., float('inf')], device="cuda")
scripted = torch.jit.script(fn)
fn(a)
…
-
This is related to #1510, which is fixed by #1516 for gather. The same error can happen with shift, e.g.,
```
TEST_F(NVFuserTest, FusionValidateParallelizeShift_CUDA) {
Fusion fusion;
Fusion…
-
```
def fn(x):
x=x.clamp(min=1.)*.1
return x
x=torch.randn(4, device="cuda", dtype=torch.bfloat16)
scripted = torch.jit.script(fn)
fn(x)
with torch.jit.fuser("fuser2"):
for _ in …
-
When i try to follow `examples/pytorch` the triton server is crashing, i.e. exiting with status code `-6`.
this is the log from the container:
```
I1004 17:32:10.693691 41 grpc_server.cc:4375] St…
-
This logic seems wrong (blaming myself):
https://github.com/csarofeen/pytorch/blob/devel/torch/csrc/jit/codegen/cuda/lower_predicate.cpp#L343-L348
```
filters.emplace_back([this](Expr* expr) {
…