gpu-kernels Search Results

1000+ results
for gpu-kernels

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

hpsfoundation/tac #4

Lifecycle Policy: Template Update for Extra Questions (Core)

Proposal to fine-tune the questions in the new project template: https://github.com/hpsfoundation/tac/blob/main/.github/ISSUE_TEMPLATE/new-project-proposal.md Moved out from #2 I think also rel…

ax3l updated 1 week ago
1
vllm-project/vllm #7524

[Feature]: need a GB-based alternative for gpu_memory_utiliz…

### 🚀 The feature, motivation and pitch I'm struggling to figure out how to extend our test suite to include vllm tests. The problem is that by default vllm will take over the whole gpu, which prev…

stas00 updated 1 month ago
9
CliMA/Oceananigans.jl #3693

Reducing parameter space usage by `LatitudeLongitudeGrid`

It's come to light that the `LatitudeLongitudeGrid` consumes almost 1 kb of parameter space as an argument on the GPU. This is a problem because at least for some versions of CUDA + GPUs (unsure how m…

glwagner updated 1 month ago
3
openxla/xla #16914

XLA does too many un-fused transposes

(This is running on a Nvidia 4090 GPU, with jax '0.4.31') I had got that is something like the example below. Here, the depth-wise convolution wants the input to be transposed from [batch, sequence…

ywrt updated 1 week ago
3
pytorch/pytorch #134459

linalg.lu_factor: LU without pivoting is not implemented on…

### 🐛 Describe the bug ```python import torch print(torch.__version__) A = torch.tensor([ [1,1,1], [1,2,2], [1,2,3] ], dtype=torch.float32) l, u = torch.linalg…

ashok-arora updated 4 days ago
13
vllm-project/vllm #6565

[Feature]: Any thoughts about MI50 support ?

### 🚀 The feature, motivation and pitch MI50 is like 2080ti ,but so much cheaper(1/4), and with 16GB memory. But when I tried to compile it in MI50 machine, I got this: [ 83%] Building HIP obj…

linchen111 updated 2 months ago
3
minhhoai1001/ppocr-vietnamese #1

OSError: (External) CUDNN error(9), CUDNN_STATUS_NOT_SUPPORT…

Lỗi khi chạy inference cả model detect và rec cùng 1 lúc bằng code này: !python tools/infer/predict_system.py \ --image_dir="./train_data/vietnamese/test_image/im1500.jpg" \ --det_model_di…

truong04 updated 2 weeks ago
1
pytorch/torchtune #1493

Some NCCL operations have failed or timed out. Due to the as…

Hey, I have seen the previous issues. Based on that I tracked down the approximate lane where the pipeline is struck which is the setup function where it failed to load the model. The training is n…

Vattikondadheeraj updated 9 hours ago
5
ROCm/rocprofiler #145

[Issue]: rocprofv2 get more kernel dispatches than rocprofv1

### Problem Description When use api trace on vllm inference, rocprof get less kernel dispatch records than rocprof_v2, which result tend to be correct? Possible reasons for the mismatch between ker…

hgtsoi updated 2 weeks ago
3
bytedance/flux #36

[QUESTION] The gemm time on GPU of different rank under tp8 …

**Your question** Ask a clear and concise question about Flux. There is torch.Size([5120, 1024]) x torch.Size([8192, 1024]) gemm_rs op in my project,fp16.I made a benchmark on A100: torch.Size(…

Rainlin007 updated 2 weeks ago
8

上一页 1...5 6 7 8 9 10 11...100 下一页

1000+ results for gpu-kernels

1000+ results
for gpu-kernels