Open chenhongyu2048 opened 2 months ago
UPDATE:
the running result: will not report error (computation is finished), but cutlass::reference::host::TensorEquals
failed
UPDATE: the running result: will not report error (computation is finished), but
cutlass::reference::host::TensorEquals
failed
Maybe such an error is an accuracy issue?
After further debugging, we found that this error was caused by the difference between the calculation result and the tensor_ref_d in some of the values. We've added the following code:
ElementOutput sum = (ElementOutput)0;
ElementOutput *d_ptr = tensor_d.host_data();
ElementOutput *ref_d_ptr = tensor_ref_d.host_data();
for (int i = 0; i < 32 * 12288; ++i) {
sum += *(d_ptr+i) - *(ref_d_ptr+i);
if (*(d_ptr+i) - *(ref_d_ptr+i) != 0) {
std::cout<<i<<" "<<*(d_ptr+i) - *(ref_d_ptr+i)<<std::endl;
}
}
std::cout << sum << std::endl;
and got the following result:
when i=60390, print -4
, which is the differnece between (d_ptr+i) and (ref_d_ptr+i).
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.
when I use cutlass template to write my own gemm kernel, I meet a Internal error, even I follow the settings provided by cutlass profiler.
The full code is as below:
The above setting is provided by cutlass profiler:
I compiled it with
nvcc -std=c++17 -arch=sm_80 -I/xxx/third_party/cutlass/include -I/xxx/third_party/cutlass/tools/util/include -I/xxx/third_party/cutlass/tools/library/include -I/xxx/third_party/cutlass/examples/common -lcublas ./cutlass_gemm.cu --expt-relaxed-constexpr -o cutlass_gemm_example
. I use cuda12.6 and RTX 6000 ada GPU.I'd like to know if this is an issue with the way I'm using it?