Attention: Multi-level input processing.

Split input sequence into blocks. Each blocks is processed by two self-attentions. The blocks themselves are transformed into a second level sequence which use the two self-attentions. One self-attention is shared between the levels.