codeplaysoftware / cutlass-fork

CUDA Templates for Linear Algebra Subroutines
Other
8 stars 20 forks source link

Remove caching effects in the Benchmarks #136

Closed AD2605 closed 1 month ago

AD2605 commented 2 months ago

Uses a memset operation to invalidate the cache by changing data in the block_A, block_B, Block_C matrices. I chose memset rather than ping-pong or re-initialization of data because it would otherwise be too slow, and benchmarks would take a lot of time to run