Open c120129754 opened 3 years ago
Mask is used to select topK nodes, while the operation is addition rather multiplication which is a common way of using mask. Could you plz answer this question or give an example of how mask influences the calculated score in topK selection.
Mask is used to select topK nodes, while the operation is addition rather multiplication which is a common way of using mask. Could you plz answer this question or give an example of how mask influences the calculated score in topK selection.