RodkinIvan / associative-recurrent-memory-transformer

[ICML 24 NGSM workshop] Associative Recurrent Memory Transformer implementation and scripts for training and evaluating
Apache License 2.0
31 stars 6 forks source link

Bug in ARMT? #1

Open TachikakaMin opened 3 weeks ago

TachikakaMin commented 3 weeks ago

https://github.com/RodkinIvan/associative-recurrent-memory-transformer/blob/aa145de9f50c08778e2579a2130b3db0d379bce5/modeling_amt/language_modeling.py#L157

Hi,

Thanks for the code.

Is this a bug? Since each layer won't have any connection with other layers.

Should it be [i-1] ?

@RodkinIvan

RodkinIvan commented 3 weeks ago

Hello,

Thank you for your interest in ARMT.

No, it's not a bug. Note that you refer to the __init__ method. Here we wrap each transformer layer with associative memory. Since we substituted all the layers with the wrapped layers, during the forward, transformer will call the wrapped ones, and therefore, before calling each layer, it will associate with the association matrix, and after calling each layer, it will update the associative state.

So, it should be [i] because we wrap the particular layer.

Feel free to ask if you need any further clarifications.

@TachikakaMin