Closed Bestlzz closed 2 months ago
Do you need to add an eRPE to each layer? Yes, you can add an eRPE to each layer of the Transformer.
Are the eRPEs the same one? No, the eRPEs are not the same for each layer. Each layer has its own set of parameters to learn, as they might encode various abstractions of the data. Therefore, they do not share parameters.
Hi, I have two questions: