Open vhientran opened 5 years ago
Please let me know if this model architecture calculates attention weights for padding tokens or not.
Please let me know if this model architecture calculates attention weights for padding tokens or not.