wmma-api Search Results

69 results
for wmma-api

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

ROCm/ROCm #3445

Ubuntu 24.04 and LLVM-19

Hello Folks, I hope you are doing well. What do we need to build our own rocm hip-sdk from develop? as it seems currently 6.2 is delayed I'm sure it will ship with llvm-19 stack but this is …

waheedi updated 3 months ago
17
Bruce-Lee-LY/cuda_hgemm #3

Change to block of 128 by 256

谢谢分享代码！如果我把wmma_async_pg2s.cu 的block_rows and block_cols改成256 和 128，会出现error。我看不出来有什么问题... ``` ./hgemm -M=4096 -N=4096 -K=1024 -profiling_iterations=1 -warmup_iterations=1 -enable_check=true [HGEMM…

yupei-ms updated 1 year ago
3
cupy/cupy #8146

Error when including `mma.h`: instance of overloaded functio…

### Description I am trying to figure out why I am getting the following error when I try to include `mma.h`. ``` PS D:\Users\Marko\Source\Repos\The Spiral Language\Spiral Compilation Tests> & '…

mrakgr updated 9 months ago
11
AlexeyAB/darknet #4346

EfficientDet: Scalable and Efficient Object Detection - 51.0…

EfficientDet: Scalable and Efficient Object Detection * paper: https://arxiv.org/abs/1911.09070v1 > First, we propose a weighted bi-directional feature pyramid network (BiFPN), which allows ea…

AlexeyAB updated 4 years ago
164
ROCm/rocWMMA #202

Advantages of Using rocWMMA over Compiler Intrinsics for CUD…

Hello, I'm currently in the process of transitioning from CUDA to ROCm. During this transition, I've come to understand that rocWMMA can serve as a mapping library for the "Warp matrix functions **…

xinyi-li7 updated 1 year ago
6
accel-sim/accel-sim-framework #169

Cannot run the simulator with deep benchmarks using tensor c…

I compiled cutlass-bench and ran the simulator using PTX mode. When I ran cutlass_perf_test, I came into: _**cutlass_perf_test: cuda_api_object.h:82: void CUctx_st::add_ptxinfo(const char*, cons…

LinlinCH updated 1 year ago
14
ROCm/rocWMMA #208

Any conversion functions for rocWMMA type

Hi, Since rocWMMA provided the separate datatype like `rocwmma::bfloat16`, I wondered if there is any functions which can convert float number to your rocwmma half or bfloat16 like `__float2bfloat…

xinyi-li7 updated 1 year ago
4
ROCm/rocWMMA #239

Using rocwmma with pytorch

I want to be able to convert a cuda code containing wmma into hip. I have unit tests done and it works. I hope to integrate this code into pytorch. When I executed "python setup.py install", I found t…

fileaccent updated 1 year ago
3
apache/tvm #14137

[Bug][MetaSchedule] Failed to tune fp16 dense_add workload o…

I found that when tuning the fp16 tensorcore `dense_add` kernel, the tuning fails on some shapes and the reported error is non-deterministic. For example, when the workload is `N=1, M=1000, K=512`,…

wllqwzx updated 1 year ago
9
ROCm/ROCm #1880

7900 XTX Refuses to Run tensorflow-rocm Toy Example

### Issue Type Bug ### Tensorflow Version Tensorflow-rocm v2.11.0-3797-gfe65ef3bbcf 2.11.0 ### rocm Version 5.4.1 ### Custom Code Yes ### OS Platform and Distribution Archli…

Mushoz updated 2 months ago
274

上一页 1...1 2 3 4 5 6 7...7 下一页

69 results for wmma-api

69 results
for wmma-api