datamllab / LongLM

[ICML'24 Spotlight] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
https://arxiv.org/pdf/2401.01325.pdf
MIT License
597 stars 59 forks source link

Phi2 implementation and Suggestions #2

Closed agokrani closed 8 months ago

agokrani commented 8 months ago

Hi,

Loved this paper and implementation. I implemented this for Phi2 with transformers==4.36.2 without caching. The outputs with in context size are even better at following instruction than actual model. However, when going out of context window, I am seeing a repetition. This might be due to extending context window size bit too much. Do you guys have suggestions for experimenting with different group and neighbour sizes or any insights.

Here is my implementation: https://github.com/agokrani/LongLM/tree/phi2

I haven't implemented KV caching for now due to change in KV caching format in transformers. I will try to implement it soon. Would love to hear your thoughts.

Mooler0410 commented 8 months ago

Hi! Thanks for sharing this! We also found what you mentioned "The outputs with in context size are even better at following instruction than actual model".😄 Considering Phi-2's context windows is only 2k, we suggest use 386-1024 as the neighbor window. Group size 3~8 should be good. Maybe larger group can work better. We also implemented phi-2 by ourself with KV cache, you may check it. But we haven't finished the test!

agokrani commented 8 months ago

Hey @Mooler0410,

Cool. At least this gives me hope that my implementation is in the right direction. Looking at the implementation of Phi2, the new group values are not added to the cache, I was actually thinking about adding this by extending the forward of the PhiModel class and using custom cache class. Apparently they don't do anything with the cos sin values they store in the cache. Maybe I am wrong, what do you think?

Mooler0410 commented 8 months ago

You are correct! We are too lazy to change the new cache class.

agokrani commented 8 months ago

Great 👍 will give this a shot. If I am successful, I will make a pull request.