nvfuser Search Results - Githubissues

1000+ results
for nvfuser

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

csarofeen/pytorch #1964

Indexing error triggered by transpose scheduler

### 🐛 Describe the bug Creating an issue for https://github.com/csarofeen/pytorch/pull/1930#issuecomment-1232175340 ``` PYTORCH_NVFUSER_ENABLE="transpose_scheduler" ./build/bin/nvfuser_bench "-…

zasdfgbnm updated 2 years ago
1
pytorch/pytorch #80986

[Prims+NVFuser] nvFuser running into "Tensors of type Sparse…

After enabling aten2aten decomps: ===== dlrm_backward_0 ====== Generating testing data... aten2aten decomp: aten.detach.default aten2aten decomp: aten.detach.default aten2aten decomp: aten.deta…

SherlockNoMad updated 2 years ago
3
NVIDIA/Fuser #2063

reduction with pointwise epilogue (tests, code optimizaiton,…

Current reduction scheduler limites types of epilogue pointwise ops can be fused through `SchedulerTopologyChecker`. It needs further works in the following areas: (1) the tests are missing (2) th…

liqiangxl updated 6 months ago
2
Lightning-AI/lightning-thunder #1234

OOM with rematerialization when torch.compile works

## 🐛 Bug When benchmarking model: 'Mixtral-8x7B-v0.1' we get OOM errors even with --checkpoint_activations True The same configurations works for torch.compile. Might be related to [https://gi…

mpatel31415 updated 1 week ago
2
NVIDIA/Fuser #2741

Performance of innerOuterPersistent scheduler is very sensit…

Noticed a performance regression of layer norm backward from July 19 to July 26 (no CI data from 22, 23, 24, 25) on H100 with hidden size around 15K to 16K. See [dashboard](http://nv/eh3). SOL droped …

liqiangxl updated 2 months ago
3
Lightning-AI/lightning-thunder #703

vjp correctness fails for sdpa manual grad forward sdpa

From the CI run in an intermediate version of #691 : FAILED thunder/tests/test_grad.py::test_vjp_correctness_sdpa_manual_grad_forward_scaled_dot_product_attention_nvfuser_cuda_thunder.dtypes.float1…

t-vi updated 3 months ago
1
NVIDIA/Fuser #93

Segfault when gathering broadcasted inputs

The following code results in a segfault as of yesterday (e.g. commit 1a5db862df21e5dabaeb0f3648a012ea60cee8c3) ```python import torch import torch.nn.functional as F import nvfuser def test_em…

jacobhinkle updated 1 year ago
10
NVIDIA/Fuser #2755

makeInitialSchedulerEntry requires more than the current seg…

# Repro This is a follow-up to https://github.com/NVIDIA/Fuser/pull/1649#discussion_r1535920468. I created a simple repro in wjy/input, which you can run by ``` $ git fetch origin wjy/input …

wujingyue updated 3 weeks ago
8
pytorch/pytorch #85520

Implement `rand_like` ref and implement nvfuser_impl for `un…

### 🐛 Describe the bug In the TorchDynamo+AOT_Autograd+Primtorch stack, Dropout is currently implemented as a decomposition when traced by AOT_Autograd. The decomposition calls `rand_like` to provid…

kevinstephano updated 2 years ago
1
RosettaCommons/RFdiffusion #199

/usr/local/lib/python3.9/dist-packages/e3nn/o3/_spherical_ha…

I am getting this strange error, and despite that RFdifusion actually seem to run fine. I still get output structures that looks correct. Is that something I should be concerned over? ``` /usr/…

KoningCheems updated 1 month ago
1

上一页 1...2 3 4 5 6 7 8...100 下一页

1000+ results for nvfuser

1000+ results
for nvfuser