-
### 🐛 Describe the bug
In primTorch, with regular PyTorch refs, it's easy to check them based on the `_refs` prefix and look them up in `__all__`. For nvfuser-specific ones, it's not yet clear what t…
-
To reproduce:
```py
from nvfuser import FusionDefinition, DataType
import torch
with FusionDefinition() as fd:
t0 = fd.define_tensor(shape=[-1, -1], contiguity=[True, True], dtype=DataType.…
-
We currently need zeroed global memory buffers for cross-cta communication. Our current executor calls `at::zeros` to initialize this before each launch of our nvfuser kernel, adding a handful of micr…
-
Integer arithmetic can follow the uniform data path if all the values match among thread within a warp, which improves performance by reducing interruptions to floating point ops and non-uniform instr…
-
输出如下:
E:\GPT-SoVITS-beta0217>runtime\python.exe webui.py
Traceback (most recent call last):
File "E:\GPT-SoVITS-beta0217\webui.py", line 4, in
import json,yaml,warnings,torch
File "E:\…
-
### 🚀 The feature, motivation and pitch
Currently the handling of view in scheduler is sub-optimal.
For views inside the fusion group that connects fusion, it makes sense, since this usually gives…
-
This test failed several times in CI, seems due to tolerance.
```
00:31:45 FAILED tests/python/pytest_ops.py::test_correctness_truediv_complex64 - AssertionError: Tensor-likes are not close!
00:3…
-
With current [12/07/2023] main branch, the following fusion failed.
```
TEST_F(
NVFuserTest, ConsecutiveOuterWelford) {
std::unique_ptr fusion_ptr = std::make_unique();
auto fusion = fusi…
-
```
>>> import torch
>>> torch.cuda.is_initialized()
False
>>> import transformer_engine
>>> torch.cuda.is_initialized()
True
```
Import alone shouldn't initialize CUDA. Custom subprocess la…
-
### 🐛 Describe the bug
Current tests use double-precision constants passed to `where()`, which works. There is currently no `using Float = Scalar` scalar defined but we'd like to extend `where` to su…