Open FacePoluke opened 1 year ago
the function returns a boolean mask so the placeholder value doesn't really matter. When the transformer computes attention, places that have non-zero value (i.e. True
in the returned matrix) will be masked out, which is somewhat equivalent to adding -inf
to the attention value
The values outside of the diagonal are 1, not -inf.