Closed ifed-ucsd closed 1 week ago
We didn't modify or inject any extra manipulation inside the backbone model. I think the official implementation also didn't do that, nor the original RMT paper mentions any technique about RoPE. Is there any explanation about why they push the segment tokens out to 10000 in this alternate implementation?
Thanks. In your implementation, are the write tokens always at the same position in the sequence? Say we have [read mem][segment][write mem]. Is [segment] always the same length?
Is there any explanation about why they push the segment tokens out to 10000 in this alternate implementation?
I didn't see any explanation. I filed an issue here https://github.com/lucidrains/recurrent-memory-transformer-pytorch/issues/24, but haven't received a response.
Yes, the segment length is fixed.
I'm curious if you had to do anything special to the positional encoding of the read / write memories when using RoPE? In a different RMT implementation https://github.com/lucidrains/recurrent-memory-transformer-pytorch/blob/35cd18deeb7965491873fcba4a15d581106eae39/recurrent_memory_transformer_pytorch/recurrent_memory_transformer.py#L409, the read / write tokens get assigned position 0 and the segment tokens starting position gets pushed out to 10000.