Open gohar94 opened 1 week ago
Have you correlated with real.hpp
? That file is where the basic fma
is defined, I experimented using gemm_tn
and I get the below, which is expected, I think.
From the above, I do not see any HFMA, instructions as it seems, cute does not dispatch to those, but rather HADD, FADD and HMUL.
On the other hand, I verified HMMA SASS, by experimenting with halfXhalf=float; I get the below, which is expected as cute::gemm
dispatches to this mma instruction.
What is your question? I am running the
sgemm_sm80.cu
example (debug build) with NCU. I want to see the part of the kernel that ends up executing theHFMA2
instruction and do some analysis on that. My understanding is that it should be thegemm
call in the main loop that should end up in theHFMA2
but based on what I see in NCU, that piece of code is not showing me any associated SASS code. Instead, theHFMA2
seems to be stemming from some copying atom related code. Is that expected? If so, why is it not thegemm
call ending up in theHFMA2
instruction?Below are two relevant screenshots: