-
```
#include
#include "hip/hip_runtime.h"
// 1. if N is set to up to 1024, then sum is OK.
// 2. Set N past the 1024 which is past No. of threads per blocks, and then all iterations of sum resu…
-
Hello,
I am using concurrent kernel execution on multi-GPU system using multi-stream (see code example below).
Example:
```
for(i = 0; i < GPU_N; i++)
{
....
//Set device
hipSetDevice(i)…
-
Hi,
I've been testing trilinos and came across a broken kk unit tests on h100s w/ cuda 12.4. I have not tried to reproduce the broken test stand alone but figured I'd report it. See configuration 1…
-
During installation of torch-points-kernels using this command pip install torch-points-kernels==0.7.0, I will faced the error for build wheels
Error :-
-->
Building wheel for torch-points-kerne…
-
### Feature request
Integrate Liger (Linkedin GPU Efficient Runtime) Kernel to HuggingFace Trainer, user could decide whether to enable kernel with a simple flag
### Motivation
Liger (Linkedi…
-
在编译和链接过程中遇到未定义符号的错误
```
# Copyright (C) RongTao, All right reserve.
CUDA_INSTALL_PATH = /usr/local/cuda-12.1
GCC_INSTALL_PATH = /usr
NVCC = $(CUDA_INSTALL_PATH)/bin/nvcc #cuda_12.1.r12.1
GCC =…
SIKtt updated
2 months ago
-
This is an interesting application of Devito. Was wondering if you know of a way to expose GPU computing in devito. I see all the examples go pytorch GPU -> numpy cpu -> Devito -> numpy cpu -> pytorch…
-
### 问题确认 Search before asking
- [X] 我已经搜索过问题,但是没有找到解答。I have searched the question and found no related answer.
### 请提出你的问题 Please ask your question
使用自己标注的数据集训练,使用U2NET模型训练,训练 iter==100时,c…
-
### Describe the feature request
Request:
Leverage `onnxruntime-web` kernels to create a native WebGPU Execution Provider for **non-web** environments.
Story:
I am in a unique situation where my…
-
### 🚀 The feature, motivation and pitch
MSCCL++ redefines inter-GPU communication interfaces, offering a highly efficient and customizable communication stack tailored for distributed GPU application…