Open Ageliss opened 8 months ago
Hi, unfortunately, I don't have access to any H800s (or any Hopper GPUs for that matter), so it is a bit hard to test. Which of the matrix shapes are failing and by how much? Can you perhaps print the result of this line for all test cases, i.e., what is the relative average error?
Hi, unfortunately, I don't have access to any H800s (or any Hopper GPUs for that matter), so it is a bit hard to test. Which of the matrix shapes are failing and by how much? Can you perhaps print the result of this line for all test cases, i.e., what is the relative average error?
Yes, if the thread_shape = [64, 256], I get the right thing:
However, as for [128, 128], I get the error:
@Ageliss Which cuda version was the failed test ran on? Can you retest on latest Cuda 12.4 and/or pytorch 2.2.2?
This setup can not pass UT. Could you please check it ?