Bert Model with self defined attention-mask;
modify the input parameters in the FusedAttentionLayer.cu as following line, but the final result dosn't change.
dispatcher_fp16->run(qkv_buf_, attention_mask, padding_offset, attn_workspace_, qkv_buf_2_, stream_);
Branch/Tag/Commit
main
Docker Image Version
nvcr.io/nvidia/pytorch:21.04-py3
GPU name
3090
CUDA Driver
525.89.02
Reproduced Steps