NVIDIA-developer-blog / code-samples

Source code examples from the Parallel Forall Blog
BSD 3-Clause "New" or "Revised" License
1.24k stars 633 forks source link

fix the output error check in tensor cores wmma sample #51

Closed mdoijade closed 1 year ago

mdoijade commented 1 year ago

-- fix the output error check to use rel_error -- fix cublas time reporting issue due to startup time addition -- add option in makefile to specify arch at build time

mdoijade commented 1 year ago

@harrism can you please review this PR and help with merging it.