Open TachikakaMin opened 3 weeks ago
Hello,
Thank you for your interest in ARMT.
No, it's not a bug. Note that you refer to the __init__
method. Here we wrap each transformer layer with associative memory. Since we substituted all the layers with the wrapped layers, during the forward, transformer will call the wrapped ones, and therefore, before calling each layer, it will associate with the association matrix, and after calling each layer, it will update the associative state.
So, it should be [i]
because we wrap the particular layer.
Feel free to ask if you need any further clarifications.
@TachikakaMin
https://github.com/RodkinIvan/associative-recurrent-memory-transformer/blob/aa145de9f50c08778e2579a2130b3db0d379bce5/modeling_amt/language_modeling.py#L157
Hi,
Thanks for the code.
Is this a bug? Since each layer won't have any connection with other layers.
Should it be [i-1] ?
@RodkinIvan