datamllab / LongLM

[ICML'24 Spotlight] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
https://arxiv.org/pdf/2401.01325.pdf
MIT License
549 stars 54 forks source link

Example for phi2? #22

Closed JoanZhou closed 3 months ago

JoanZhou commented 3 months ago

Could you please release the example script for Phi-2? Thanks.

Mooler0410 commented 3 months ago

It's similar to llama or mistral / llama. Just use the function modify_method_of_instance (in modify_utils.py) to replace the forward method of Phi-2's attention with self-extend.

JoanZhou commented 3 months ago

Thank you for the reply. I am using phi_self_extend_patch_4_37.py in transformers 4.37.1, and I got answer like this: image Tried a few parameters: group_size_1=4, group_size_2=1024 group_size_1=4, group_size_2=512

Mooler0410 commented 3 months ago

Thanks for the feedback!

As we commented on the patch for phi-2 on 4.37

" transfromers version 4.37 (a furture version) Should work for 'microsoft/phi-2', a offical hf version of microsfot/phi-2, check the detail in Huggingface Hub. It's dfferent from the previous version for 'susnato/phi-2', which is the default version in transformers 4.36.2 ! Haven't done comprehensive test, but it should work."

It may have some bugs. Will check it once I'm back from the spring break.

Mooler0410 commented 3 months ago

Thank you for the reply. I am using phi_self_extend_patch_4_37.py in transformers 4.37.1, and I got answer like this: image Tried a few parameters: group_size_1=4, group_size_2=1024 group_size_1=4, group_size_2=512

Hi, we found that Phi-2 it-self cannot do passkey retrieval well 🤣. Without SelfExtend, you may try to ask the vanilla Phi-2 to do a 1.5k passkey retrieval and it will fail like the example you provided.

SelfExtend just elicits a LLM's long capabilities and does not equip it with any new capability. This means, if the model cannot finish a task well within its own pretraining window, it cannot do it on longer contexts with SelfExtend.

With the existing patch for Phi-2, on other tasks, such as 'Needle in the haystack', SelfExtend works well. Have a try!

Mooler0410 commented 3 months ago

Thank you for the reply. I am using phi_self_extend_patch_4_37.py in transformers 4.37.1, and I got answer like this: image Tried a few parameters: group_size_1=4, group_size_2=1024 group_size_1=4, group_size_2=512

Hi, I happened to test Phi-2 with 4.36.2 today. It seems that Phi-2 itself can do passkey with this version, while it cannot with transformers == 4.38.2. You may consider this.

Mooler0410 commented 3 months ago

Thank you for the reply. I am using phi_self_extend_patch_4_37.py in transformers 4.37.1, and I got answer like this: image Tried a few parameters: group_size_1=4, group_size_2=1024 group_size_1=4, group_size_2=512

Hi, I happened to test Phi-2 with 4.36.2 today. It seems that Phi-2 itself can do passkey with this version, while it cannot with transformers == 4.38.2. You may consider this.

This conclusion is misleading. It is one of the variants I tested works well with transformers==4.36.2.

I tested more variants of Phi-2, find that: with transformers==4.38.2, rhysjones/phi-2-orange-v2 works well, while "microsoft/phi-2" cannot work... It's super wired, considering that passkey retrieval is very simple.

All the tested models are vanilla (without SelfExtend). All the input sequences have a length of 1.5k, which is within its context window (2k).

Mooler0410 commented 3 months ago

Thank you for the reply. I am using phi_self_extend_patch_4_37.py in transformers 4.37.1, and I got answer like this: image Tried a few parameters: group_size_1=4, group_size_2=1024 group_size_1=4, group_size_2=512

I believe this response will be the last one. I have figured out what happens. Phi-2 is sensitive to the prompt format. Different from other models, Phi-2 requires using the recommended template: "To encourage the model to write more concise answers, you can also try the following QA format using "Instruct: \nOutput:"(https://huggingface.co/microsoft/phi-2). With this template, now, with transfromers==4.38.2, Phi-2 can successfully do passkey retrieval.