Closed binaryOmaire closed 3 years ago
The goal of an attention layer is to select the important parts of the input to consider for generating output. The input data itself impacts its importance, hence it is input to calculate context.
Additionally, the network's context should vary over time. Only inputting x means the context will be static. As such, this tutorial has a RNN layer on top of the attention mechanism, the output of which is mildly confusingly called y. y(t-1) is the previous RNN output, and having this input for context provides a time varying mechanism for the AI to shift it's attention over time.
Does this answer the question?
What are x,y,t-1 in Dense(x,yt-1)?