wmma-api Search Results

69 results
for wmma-api

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

ashawkey/torch-ngp #4

Compilation issue - RuntimeError: Error building extension '…

Thanks for the nice work! I met the following issue when I run `python train_nerf.py data/fox --workspace trial_nerf`. Do you have any thoughts? Many thanks for your help! ``` Traceback (most rece…

wangjksjtu updated 2 years ago
9
NVIDIA/FasterTransformer #145

CUDA runtime error: CUBLAS_STATUS_INVALID_VALUE when running…

## Description I could build and run C++ encoder samples successfully as described in the README. Output sample: ``` $ ./bin/encoder_sample 32 12 32 12 64 0 0 0 0 Device Tesla V100-SXM2-16GB D…

amralaa-MSFT updated 2 years ago
18
accel-sim/accel-sim-framework #4

Syntax error on mma.sync

Hi! I'm trying to simulate the [volta_tensorop_gemm.cu](https://github.com/NVIDIA/cutlass/blob/master/examples/07_volta_tensorop_gemm/volta_tensorop_gemm.cu) in cutlass. I directly use the docker i…

apuaaChen updated 4 years ago
1
apache/tvm #4105

[RFC] Auto TensorCore CodeGen

We propose a solution for TensorCore CodeGen with significant transparency, flexibility and usability. In this solution, the algorithm description and schedule of TensorCore CodeGen is no different th…

minminsun updated 4 years ago
14
JuliaGPU/CUDA.jl #711

Ballot intrinsics should use .sync variety

**Describe the bug** A clear and concise description of what the bug is. Hi, I just installed CUDA for the first time on a clean julia environment for julia `v1.6-rc1` and `] test CUDA` fails. …

JonasIsensee updated 3 years ago
3
kblomdahl/dream-go #35

Use cuTLASS instead of cuDNN for convolutions

Replace all of our usages of [cuDNN](https://developer.nvidia.com/cudnn) with [cuTLASS](https://github.com/NVIDIA/cutlass). This would have several advantages: - [cuTLASS has a BSD-3 license](https…

kblomdahl updated 4 years ago
3
tlc-pack/tvm-tensorir #48

[TASK] TextPrinter Support for Unified TVM IRs

So far we have a text printer for relay. which allows us to print an IRModule into text format. On the TIR side, we still relies on the ReprPrinter. This is issue is for upgrading the text printe…

tqchen updated 4 years ago
28
tlc-pack/tvm-tensorir #51

[DISCUSS] GPU support and memory hierarchy

## Introduce block_hierarchy Hardware chips usually have more than one storage and execution hierarchy. As for NVIDIA GPUs, they have GPU blocks(GPU SMs), warp and CUDA cores with global, shared and …

Hzfengsy updated 4 years ago
9
JuliaGPU/CUDAnative.jl #561

WMMA examples always execute

I test this package ,it show me this error my julia is 1.4.0-rc1.0 ``` julia> using CUDAdrv, CUDAnative, CuArrays ┌ Warning: Incompatibility detected between CUDA and LLVM 8.0+; disabling debug i…

solivehong updated 4 years ago
2
NVIDIA/cutlass #130

Using CUTLASS to benchmark plain CUDA performance

Hi, I am trying to see what the best performance for row-major SGEMM for 4 specific input sizes is, when only using plain CUDA (no tensor cores, no intrinsics). This is useful to me, because I want…

richardschulze updated 4 years ago
12

上一页 1...1 2 3 4 5 6 7...7 下一页

69 results for wmma-api

69 results
for wmma-api