-
Hello Folks,
I hope you are doing well.
What do we need to build our own rocm hip-sdk from develop?
as it seems currently 6.2 is delayed I'm sure it will ship with llvm-19 stack but this is …
-
谢谢分享代码!如果我把wmma_async_pg2s.cu 的block_rows and block_cols改成256 和 128,会出现error。我看不出来有什么问题...
```
./hgemm -M=4096 -N=4096 -K=1024 -profiling_iterations=1 -warmup_iterations=1 -enable_check=true
[HGEMM…
-
### Description
I am trying to figure out why I am getting the following error when I try to include `mma.h`.
```
PS D:\Users\Marko\Source\Repos\The Spiral Language\Spiral Compilation Tests> & '…
-
EfficientDet: Scalable and Efficient Object Detection
* paper: https://arxiv.org/abs/1911.09070v1
> First, we propose a weighted bi-directional feature pyramid network (BiFPN), which allows ea…
-
Hello,
I'm currently in the process of transitioning from CUDA to ROCm. During this transition, I've come to understand that rocWMMA can serve as a mapping library for the "Warp matrix functions **…
-
I compiled cutlass-bench and ran the simulator using PTX mode.
When I ran cutlass_perf_test, I came into:
_**cutlass_perf_test: cuda_api_object.h:82: void CUctx_st::add_ptxinfo(const char*, cons…
-
Hi,
Since rocWMMA provided the separate datatype like `rocwmma::bfloat16`, I wondered if there is any functions which can convert float number to your rocwmma half or bfloat16 like `__float2bfloat…
-
I want to be able to convert a cuda code containing wmma into hip. I have unit tests done and it works. I hope to integrate this code into pytorch. When I executed "python setup.py install", I found t…
-
I found that when tuning the fp16 tensorcore `dense_add` kernel, the tuning fails on some shapes and the reported error is non-deterministic.
For example, when the workload is `N=1, M=1000, K=512`,…
-
### Issue Type
Bug
### Tensorflow Version
Tensorflow-rocm v2.11.0-3797-gfe65ef3bbcf 2.11.0
### rocm Version
5.4.1
### Custom Code
Yes
### OS Platform and Distribution
Archli…