graph-kernels Search Results

1000+ results
for graph-kernels

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

microsoft/onnxruntime #21259

[Performance] How does onnxruntime run in parallel mode?

### Describe the issue In ONNX Runtime v1.18.1, it can set option with ExecutionMode::ORT_PARALLEL, it means ops will run in parallel mode, but i cant find any executors about multi-thread, it only h…

zwyao updated 2 months ago
11
oneapi-src/oneDNN #1960

Generic OpenCL kernels are broken

The [build documentation](https://oneapi-src.github.io/oneDNN/dev_guide_build_options.html#onednn-enable-primitive-gpu-isa) claims that generic OpenCL kernels are always available. I wanted to verify …

nwnk updated 3 months ago
4
aimxhaisse/kool #4

Add graph showing evolution of kernel configs

Now that we have something like 40 kernels, it can be kool to have a graph that shows the evolution of the number of configs per kernel version.

aimxhaisse updated 9 years ago
1
apple/tensorflow_macos #218

Failed in processing TensorFlow graph MLCSubgraphOp_0_9

Hi, I am working on a version of Mask_rcnn (https://github.com/matterport/Mask_RCNN) on TF2.0 for Apple Silicon. I have converted the project for TF 2.4 and works, i mean there aren't any warning …

federicolucca updated 3 years ago
5
kokkos/kokkos-kernels #2316

kokkos kernels: broken unit test w/ cuda 12.4 on h100 gpus w…

Hi, I've been testing trilinos and came across a broken kk unit tests on h100s w/ cuda 12.4. I have not tried to reproduce the broken test stand alone but figured I'd report it. See configuration 1…

vasylivy updated 1 month ago
11
pytorch/pytorch #112788

async_compile.triton is unable to cache generated triton ker…

### 🐛 Describe the bug When inductor generates a kernel, it emits inside the async_compile.triton(...). The code inside this block is cached across different graphs. However, a recent change introdu…

oulgen updated 3 months ago
4
intel/intel-xpu-backend-for-triton #592

[Performance] Setup a microbenchmark to monitor non-gemm ker…

We need a Microbenchmark to check the performance regularly, guarantee there is no huge regression after some changes. Currently we already have 130+ triton none-gemm kernels extracted from pytorch E2…

LiyangLingIntel updated 2 months ago
7
Lightning-AI/lightning-thunder #486

FP8 Linear and conv with cudnn

## 🚀 Feature CuDNN provides flexible support for performant gemm/conv with fp8 quantization. Thunder introducing fp8 casts in its traces can benefit from cudnn fusions. ### Motivation Today, thu…

vedaanta updated 4 months ago
1
FenTechSolutions/CausalDiscoveryToolbox #56

ValueError when running VarLiNGAM

When I run VarLiNGAM on Finance dataset (http://www.skleinberg.org/data/FinanceCPT.tar.gz), I meet a ValueError. `df_data = pd.read_csv(datafile) model = VarLiNGAM(lag=3) result = model.cr…

shisi-cc updated 4 years ago
1
microsoft/DeepSpeed #1746

[REQUEST] Support for CUDA Graphs

Does DeepSpeed support Pytorch code with [CUDA Graphs](https://pytorch.org/blog/accelerating-pytorch-with-cuda-graphs/)? If not, do think it may be helpful to DeepSpeed users for further speedups? …

sarvghotra updated 1 year ago
2

上一页 1...3 4 5 6 7 8 9...100 下一页

1000+ results for graph-kernels

1000+ results
for graph-kernels