Bug in ARMT? - Githubissues

RodkinIvan / associative-recurrent-memory-transformer

[ICML 24 NGSM workshop] Associative Recurrent Memory Transformer implementation and scripts for training and evaluating

Apache License 2.0

31 stars 6 forks source link

Hello,

Thank you for your interest in ARMT.

No, it's not a bug. Note that you refer to the __init__ method. Here we wrap each transformer layer with associative memory. Since we substituted all the layers with the wrapped layers, during the forward, transformer will call the wrapped ones, and therefore, before calling each layer, it will associate with the association matrix, and after calling each layer, it will update the associative state.

So, it should be [i] because we wrap the particular layer.

Feel free to ask if you need any further clarifications.

@TachikakaMin

RodkinIvan / associative-recurrent-memory-transformer

Bug in ARMT? #1