-
This is a medium- to long-term proposal to modify our heuristics system to enable some machine-specific information to be used to choose heuristic parameters.
The goal is to have a python script th…
-
```C++
[ RUN ] NVFuserTest.FusionRepro2094_CUDA
unknown file: Failure
C++ exception with description "aten_output_tensor.allclose( fusion_output_tensor.to(aten_output_tensor.dtype()), toleranc…
-
## 🐛 Bug
I'm observing a behavior where LTC is not respecting python context manager.
e.g.
```
# enabling nvfuser instead of NNC for TorchScript fusion
with torch.jit.fuser("fuser2"):
for …
-
This is an issue for evaluation by Jie.
The expected SOL of the kernel on A100 is `(16*128*3072 * 2 * 4) / 1935*1e9 = 26 us`. The measured kernel time is `~63us` giving a `41 %SOL`. Notably, none…
-
in https://github.com/pytorch/benchmark/blob/main/torchbenchmark/util/classify_graphs.py#L172
-
nvFuser generated code for a fusion block present in DiT has worse than expected performance. The subgraph is performing a `LayerNorm + + Mul + Add + Add` computation as shown in the code below. nvFus…
-
### 🐛 Describe the bug
```C++
TEST_F(NVFuserTest, FusionExpandedInput2_CUDA) {
std::unique_ptr fusion_ptr = std::make_unique();
auto fusion = fusion_ptr.get();
FusionGuard fg(fusion);
…
-
**Describe the bug**
I try to finetune `llama3-8B` model with multi nodes but get an AtrributeError when finishing loading mcore format checkpoint and starting to build datasets, the error is below:
…
-
### 🐛 Describe the bug
```py
from nvfuser import FusionDefinition, DataType
import torch
def nvfuser_fusion_id2(fd : FusionDefinition) -> None :
T0 = fd.define_tensor(symbolic_sizes=[-1, 1]…
-
### 🐛 Describe the bug
See here for example (only change that test is compiled with sm_86 and run on A10G )
https://github.com/pytorch/pytorch/actions/runs/3222501479/jobs/5272211415
```
[ RUN …