nvfuser Search Results - Githubissues

1000+ results
for nvfuser

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

Lightning-AI/lightning-thunder #294

Timeout for Platypus-30B and Thunder compile

## 🐛 Bug With newest version of Docker image (tested on 2024-04-28 ) training with thunder.jit on 8xA100 it's not possible to run Platypus-30B and vicuna-33b-v1.3 models. This is the error: > …

mpatel31415 updated 2 months ago
4
pytorch/pytorch #79325

[NVFuser] hitting fallbacks on demucs (from torchbench + laz…

### 🐛 Describe the bug NVFuser is hitting a fallback when running demucs on torchbench + lazy tensor. Repro: ```python import torch from torch.utils.jit.log_extract import run_nvfuser, load_gra…

davidberard98 updated 2 years ago
1
pytorch/pytorch #80187

[NVFuser] Investigate modules with bad performance relative …

### 🚀 The feature, motivation and pitch In particular, `hf_DistilBert_forward_0` has ~30% regression compared to eager. Status document (for those who have access): https://docs.google.com/docum…

davidberard98 updated 2 years ago
20
NVIDIA/Fuser #1877

`test_correctness_truediv_complex64` test is numerically fla…

per title, this has happened on V100 and A100 a few times before ```python 00:13:04 =================================== FAILURES =================================== 00:13:04 _____________________…

xwang233 updated 7 months ago
4
NVIDIA/Fuser #319

Optimize fmax with NAN

Fp max reductions would typically look like: ``` for(nvfuser_index_t i154 = 0; i154 < 8; ++i154) { int i299; i299 = 4 * i154; #pragma unroll for(nvfuser_index_t i156 = 0; i156 …

naoyam updated 1 year ago
12
NVIDIA/Fuser #1597

Improve performance on RoPE (and code around it).

I'm separating this from #1502. While we can get rid of `cat` in some cases, improving nvFuser's codegen for `slice` and `cat` will still benefit the RoPE module and the QKV split around it. https…

wujingyue updated 3 months ago
15
Lightning-AI/lightning-thunder #193

einsum performance regression after disabling bookend optimi…

## 🐛 Bug Repro script from @nikitaved ``` import itertools import torch from torch.utils.benchmark import Timer, Compare import thunder #dtypes = [torch.bfloat16, torch.float, torch.double…

jjsjann123 updated 3 months ago
4
NVIDIA/Fuser #1557

Compilation failure using `segment_set` with half precision …

This test fails to compile: ```python def test_simple_slice_fusion_bfloat16(self): inputs = [torch.randn((10,), dtype=torch.bfloat16, device="cuda:0")] def fusion_func(fd: Fu…

jacobhinkle updated 7 months ago
5
pytorch/pytorch #107389

caffe does not respect CUDNN_LIB_DIR when building from sour…

### 🐛 Describe the bug Per the [setup.py](https://github.com/pytorch/pytorch/blob/main/setup.py#L171) and past experience from torch 1.13, setting the `CUDNN_LIB_DIR` environment variable should po…

mwlon updated 1 year ago
7
NVIDIA/Fuser #1728

python_tests.test_normalization.test_instance_norm_multigpu …

This test started failing a couple of days ago. python_tests.test_normalization.test_instance_norm_multigpu See also https://gitlab-master.nvidia.com/dl/pytorch/update-scripts/-/issues/50225

naoyam updated 8 months ago
14

上一页 1...7 8 9 10 11 12 13...100 下一页

1000+ results for nvfuser

1000+ results
for nvfuser