Open finbarrtimbers opened 2 weeks ago
From the TransformerEngine docs:
mask in call is ignored for ‘no_mask’ and ‘causal’.
I think that attn_mask_type should be set to causal_padding, or else it'll ignore the mask being passed on line 370.
attn_mask_type
causal_padding
Thanks Finbarr. Created a PR to address this
From the TransformerEngine docs:
I think that
attn_mask_type
should be set tocausal_padding
, or else it'll ignore the mask being passed on line 370.