Open NareshPS opened 1 year ago
Split input sequence into blocks. Each blocks is processed by two self-attentions. The blocks themselves are transformed into a second level sequence which use the two self-attentions. One self-attention is shared between the levels.
Split input sequence into blocks. Each blocks is processed by two self-attentions. The blocks themselves are transformed into a second level sequence which use the two self-attentions. One self-attention is shared between the levels.
References