gpu-kernels Search Results

borisdayma/dalle-mini #339

CustomCall failed: jaxlib/gpu/prng_kernels

While running this in Google Colab I get the following error: I am using the pro version of Google Collab. XlaRuntimeError Traceback (most recent call last) [](https://lo…

ERIK54600 updated 1 week ago

bclarkson-code/Tricycle #71

Optimised GPU kernels

Andrej Karpathy has just ~upstaged me~ released llm.c which contains some highly optimised CUDA kernels. If we include these into tricycle, we can probably get a significant performance boost for oper…

bclarkson-code updated 1 month ago

DEAL-US/SpaceRL-KG #1

作者你好，我在复现您代码的过程中，运行trainer.py中的代码后出现了以下错误。请问您方便告知我一下如何解决吗？

running integrity checks No embeddings have been generated for _utils datafolder is: E:\lh\SpaceRL-KG-master/datasets/COUNTRIES generating embeddings for dataset COUNTRIES and models ['TransE_l2'] …

lh5533223 updated 3 days ago

FluxML/Optimisers.jl #178

GPU kernels for optimizers

### Motivation and description Wondering what kind of speedup can be achieved by writing GPU kernels for optimizers. Take a look at @pxl-th's implementation of Adam below https://github.com/Jul…

vpuri3 updated 2 months ago

coreylowman/dfdx #925

Kernels written in rust-gpu

I'd like to add support for `rust-gpu` in the not-so-distant future. I have some questions while I figure out the plan: 1. Would it make sense to have shaders written with `rust-gpu` to be hung off…

LegNeato updated 3 weeks ago

NVIDIA/nccl #689

Does NCCL Allreduce kernels slowdown the computation kernels…

Hello, I'm trying to compare training speed between using 1 node and using 2 nodes (one GPU per node). From 1 node training, back-propagation (calculate gradients & update parameters) takes abo…

ihchoi12 updated 1 week ago

foundation-model-stack/fms-acceleration #76

Introduce Liger Fused Cross Entropy Kernel to FOAK Plugin

## Description Consider adding additional FusedCrossEntropyLoss kernel to FOAK set of kernels given the additional improvement seen using it in earlier tests (See Background below). Considerati…

achew010 updated 2 weeks ago

hpcaitech/ColossalAI #6047

[FEATURE]: Is it Possible to integrate Liger-Kernel?

### Describe the feature https://github.com/linkedin/Liger-Kernel Liger Kernel is a collection of Triton kernels designed specifically for LLM training. It can effectively increase multi-GPU train…

ericxsun updated 3 days ago

kokkos/kokkos-kernels #2316

kokkos kernels: broken unit test w/ cuda 12.4 on h100 gpus w…

Hi, I've been testing trilinos and came across a broken kk unit tests on h100s w/ cuda 12.4. I have not tried to reproduce the broken test stand alone but figured I'd report it. See configuration 1…

vasylivy updated 2 weeks ago

ROCm/rocprofiler #60

rocm profiler creates trace for 1 gpu only when kernels laun…

``` #include #include "hip/hip_runtime.h" // 1. if N is set to up to 1024, then sum is OK. // 2. Set N past the 1024 which is past No. of threads per blocks, and then all iterations of sum resu…

gggh000 updated 1 month ago

1000+ results for gpu-kernels

1000+ results
for gpu-kernels