nvfuser Search Results - Githubissues

1000+ results
for nvfuser

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

NVIDIA/Fuser #2418

Data-driven heuristics

This is a medium- to long-term proposal to modify our heuristics system to enable some machine-specific information to be used to choose heuristic parameters. The goal is to have a python script th…

jacobhinkle updated 2 months ago
2
NVIDIA/Fuser #374

`FusionRepro2094_CUDA` is failing

```C++ [ RUN ] NVFuserTest.FusionRepro2094_CUDA unknown file: Failure C++ exception with description "aten_output_tensor.allclose( fusion_output_tensor.to(aten_output_tensor.dtype()), toleranc…

zasdfgbnm updated 1 year ago
3
pytorch/pytorch #67540

LTC asynchronous executor is not synced with python code and…

## 🐛 Bug I'm observing a behavior where LTC is not respecting python context manager. e.g. ``` # enabling nvfuser instead of NNC for TorchScript fusion with torch.jit.fuser("fuser2"): for …

jjsjann123 updated 2 years ago
3
NVIDIA/Fuser #540

Vectorization of loads in horizontally fused `slice` operati…

This is an issue for evaluation by Jie. The expected SOL of the kernel on A100 is `(16*128*3072 * 2 * 4) / 1935*1e9 = 26 us`. The measured kernel time is `~63us` giving a `41 %SOL`. Notably, none…

kevinstephano updated 1 year ago
17
pytorch/benchmark #2202

eval(f.read()) is used, which will cause security issue

in https://github.com/pytorch/benchmark/blob/main/torchbenchmark/util/classify_graphs.py#L172

duck7216 updated 7 months ago
1
NVIDIA/Fuser #2146

LayerNorm fusion in Diffusion Transformer has worse performa…

nvFuser generated code for a fusion block present in DiT has worse than expected performance. The subgraph is performing a `LayerNorm + + Mul + Add + Add` computation as shown in the code below. nvFus…

parthmannan updated 3 months ago
17
csarofeen/pytorch #2165

Compilation error on reduction of expanded input

### 🐛 Describe the bug ```C++ TEST_F(NVFuserTest, FusionExpandedInput2_CUDA) { std::unique_ptr fusion_ptr = std::make_unique(); auto fusion = fusion_ptr.get(); FusionGuard fg(fusion); …

zasdfgbnm updated 1 year ago
1
NVIDIA/Megatron-LM #937

[BUG]Get an AtrributeError when trying to finetune llama3-8B…

**Describe the bug** I try to finetune `llama3-8B` model with multi nodes but get an AtrributeError when finishing loading mcore format checkpoint and starting to build datasets, the error is below: …

nakroy updated 2 weeks ago
5
csarofeen/pytorch #2549

broadcast_in_dim: The size of contiguity must equal to the n…

### 🐛 Describe the bug ```py from nvfuser import FusionDefinition, DataType import torch def nvfuser_fusion_id2(fd : FusionDefinition) -> None : T0 = fd.define_tensor(symbolic_sizes=[-1, 1]…

IvanYashchuk updated 1 year ago
6
pytorch/pytorch #86717

NVFuser `FusionRootMappingMultipleBroadcast_CUDA` raises exc…

### 🐛 Describe the bug See here for example (only change that test is compiled with sm_86 and run on A10G ) https://github.com/pytorch/pytorch/actions/runs/3222501479/jobs/5272211415 ``` [ RUN …

malfet updated 2 years ago
2

上一页 1...5 6 7 8 9 10 11...100 下一页

1000+ results for nvfuser

1000+ results
for nvfuser