-
**What is your question?**
The mainloop fusion examples provided in [25_ampere_fprop_mainloop_fusion ](https://github.com/NVIDIA/cutlass/tree/main/examples/25_ampere_fprop_mainloop_fusion) and [26_…
-
I'd like to facilitate a discussion on how to represent hierarchical programs in our IR. This topic has come up repeatedly in various contexts so there might be an opportunity to introduce a nice abst…
-
When I try python extract_mesh_tsdf.py -m exp_playroom/release --iteration 30000, i found issues below:
```
Looking for config file in exp_playroom/release/cfg_args
Config file found: exp_playroom/…
-
The first line works, the second raises an exception
```
import numpy as np
import xarray as xr
import cupy_xarray
xr.DataArray([1, 2, np.nan]).chunk(dim_0=1).as_cupy().sum().compute()
xr.Data…
-
hi, TE is really a great job.
how to use in FusedRMSNorm in TE?
https://github.com/NVIDIA/apex/blob/master/apex/normalization/fused_layer_norm.py#L329
-
# The Climate Modeling Alliance
## Software Design Issue 📜
### Purpose
Demonstrate 0.5 s per timestep at ~110 km resolution with 64 levels with all (but gravity wave) parameterizations on a single …
-
### 🐛 Describe the bug
When I try to fuse two modules (e.g., `Conv2d` and `BatchNorm2d`). I tried 1,000 times random input, finding that the output produced by `fuse_modules` is inconsistent with t…
-
### 🚀 The feature, motivation and pitch
Here is some brief context:
Right now inductor decides loop ordering before fusion. That can lose some fusion opportunities. E.g. if node1 and node2 pick in…
-
### 🚀 The feature, motivation and pitch
Repro of the issue:
```
def test(x):
y = torch.sum(x)
z = (x + x.t()) / 10.0 # x need to be a squared tensor here
return y, …
-
Hello,
Nvidia MLPerf suggests to use [TensorRT](https://github.com/NVIDIA/TensorRT) framework for a performant inference deployment. For DLRM (DL based Recommendation Systems) inference on GPU, I …