This layer is an extension of the existing ParametricAttention layer, adding support for transformations (such as a non-linear layer) of the key representation. This brings the model closer to the paper that suggested it (Yang et al, 2016) and gave slightly better results in experiments.
Types of change
Feature
Checklist
[x] I confirm that I have the right to submit this contribution under the project's MIT license.
[x] I ran the tests, and all new and existing tests passed.
[x] My changes don't require a change to the documentation, or if they do, I've added all required information.
Description
This layer is an extension of the existing
ParametricAttention
layer, adding support for transformations (such as a non-linear layer) of the key representation. This brings the model closer to the paper that suggested it (Yang et al, 2016) and gave slightly better results in experiments.Types of change
Feature
Checklist