-
### Describe the issue
In ONNX Runtime v1.18.1, it can set option with ExecutionMode::ORT_PARALLEL, it means ops will run in parallel mode, but i cant find any executors about multi-thread, it only h…
zwyao updated
2 months ago
-
The [build documentation](https://oneapi-src.github.io/oneDNN/dev_guide_build_options.html#onednn-enable-primitive-gpu-isa) claims that generic OpenCL kernels are always available. I wanted to verify …
nwnk updated
3 months ago
-
Now that we have something like 40 kernels, it can be kool to have a graph that shows the evolution of the number of configs per kernel version.
-
Hi,
I am working on a version of Mask_rcnn (https://github.com/matterport/Mask_RCNN) on TF2.0 for Apple Silicon.
I have converted the project for TF 2.4 and works, i mean there aren't any warning …
-
Hi,
I've been testing trilinos and came across a broken kk unit tests on h100s w/ cuda 12.4. I have not tried to reproduce the broken test stand alone but figured I'd report it. See configuration 1…
-
### 🐛 Describe the bug
When inductor generates a kernel, it emits inside the async_compile.triton(...). The code inside this block is cached across different graphs.
However, a recent change introdu…
-
We need a Microbenchmark to check the performance regularly, guarantee there is no huge regression after some changes.
Currently we already have 130+ triton none-gemm kernels extracted from pytorch E2…
-
## 🚀 Feature
CuDNN provides flexible support for performant gemm/conv with fp8 quantization. Thunder introducing fp8 casts in its traces can benefit from cudnn fusions.
### Motivation
Today, thu…
-
When I run VarLiNGAM on Finance dataset (http://www.skleinberg.org/data/FinanceCPT.tar.gz), I meet a ValueError.
`df_data = pd.read_csv(datafile)
model = VarLiNGAM(lag=3)
result = model.cr…
-
Does DeepSpeed support Pytorch code with [CUDA Graphs](https://pytorch.org/blog/accelerating-pytorch-with-cuda-graphs/)? If not, do think it may be helpful to DeepSpeed users for further speedups?
…