TiledTensor / TiledCUDA

TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.
MIT License
158 stars 10 forks source link

The `b2b_gemm` Example Fails Tests on A100 #144

Open haruhi55 opened 2 months ago

haruhi55 commented 2 months ago

When I run the b2b_gemm example on A100, it raises the following errors:

[16, 16, 16, 16], batch = 1, passed.
[16, 32, 16, 32], batch = 1, passed.
[32, 64, 32, 64], batch = 1, passed.
[64, 64, 32, 64], batch = 1, passed.
[256, 128, 64, 64], batch = 1, passed.
[1024, 1024, 128, 128], batch = 1, passed.
[16, 16, 16, 16], batch = 2, passed.
terminate called after throwing an instance of 'thrust::THRUST_200301_800_NS::system::system_error'
  what():  trivial_device_copy D->H failed: cudaErrorIllegalAddress: an illegal memory access was encountered
[1]    972511 IOT instruction (core dumped)  ./fused_gemms
KuangjuX commented 2 months ago

I encountered a similar issue before. It seems that this is not a problem with the kernel, but rather a memory access error that occurs when running multiple tests consecutively. The specific reason is currently unclear.