Beomi / InfiniTransformer

Unofficial PyTorch/🤗Transformers(Gemma/Llama3) implementation of Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
https://arxiv.org/abs/2404.07143
MIT License
347 stars 31 forks source link

About memory missing location information #23

Open LzhinFdu opened 6 months ago

LzhinFdu commented 6 months ago

I noticed that the memory retrieval and update happens before 'apply_rotary_pos_emb'. Wondering whether the memory lacking location information would confuse the model's perception of the order of historical information?

Lazy3valuation commented 6 months ago

From the readme: "Can train 'infinite' context -- check train.gemma.infini.noclm.1Mseq.sh with 1x H100 80G (with AdamW optimizer, No gradient checkpointing)". However I can train it with 12GB with 8b quantization and a segment size of 400.

LzhinFdu commented 6 months ago

I can also run through training. However, the current training results are not very good. I'm trying to train further

pengshuang commented 4 months ago

I also have same question, can you solve it ?

LzhinFdu commented 4 months ago

You can try to adjust the memory retrieval process to the end of 'apply_rotary_pos_emb' and compare the training performance. However, I did not try it further.

pengshuang commented 4 months ago

Thanks for your response.

lihua8848 commented 3 months ago

Can this retain location information?