Open lw921014 opened 2 years ago
Describe the Bug
For mask op here https://github.com/NVIDIA/apex/blob/master/apex/contrib/csrc/fmha/src/fmha/mask.h#L54, If we use sm 80 m16n8k16 tensor core, here should be change as following?
col = warp_n * 32 + tid;
Minimal Steps/Code to Reproduce the Bug
Environment
Hello, every warp computes a 16x16 tile, so this column offset should be ok.
@lw921014 can we close if that answered your question, or is there something else regarding this we can help with?
I got it. Thank a lot.
I have another question. According to here, does our current impl only support head size = 64, I mean, how about head size = 32, or 16?
Describe the Bug
For mask op here https://github.com/NVIDIA/apex/blob/master/apex/contrib/csrc/fmha/src/fmha/mask.h#L54, If we use sm 80 m16n8k16 tensor core, here should be change as following?
col = warp_n * 32 + tid;
Minimal Steps/Code to Reproduce the Bug
**Expected Behavior**Environment