Closed JoanZhou closed 3 months ago
It's similar to llama or mistral / llama. Just use the function modify_method_of_instance (in modify_utils.py) to replace the forward method of Phi-2's attention with self-extend.
Thank you for the reply. I am using phi_self_extend_patch_4_37.py in transformers 4.37.1, and I got answer like this:
Tried a few parameters:
group_size_1=4, group_size_2=1024
group_size_1=4, group_size_2=512
Thanks for the feedback!
As we commented on the patch for phi-2 on 4.37
" transfromers version 4.37 (a furture version) Should work for 'microsoft/phi-2', a offical hf version of microsfot/phi-2, check the detail in Huggingface Hub. It's dfferent from the previous version for 'susnato/phi-2', which is the default version in transformers 4.36.2 ! Haven't done comprehensive test, but it should work."
It may have some bugs. Will check it once I'm back from the spring break.
Thank you for the reply. I am using phi_self_extend_patch_4_37.py in transformers 4.37.1, and I got answer like this:
Tried a few parameters: group_size_1=4, group_size_2=1024 group_size_1=4, group_size_2=512
Hi, we found that Phi-2 it-self cannot do passkey retrieval well 🤣. Without SelfExtend, you may try to ask the vanilla Phi-2 to do a 1.5k passkey retrieval and it will fail like the example you provided.
SelfExtend just elicits a LLM's long capabilities and does not equip it with any new capability. This means, if the model cannot finish a task well within its own pretraining window, it cannot do it on longer contexts with SelfExtend.
With the existing patch for Phi-2, on other tasks, such as 'Needle in the haystack', SelfExtend works well. Have a try!
Thank you for the reply. I am using phi_self_extend_patch_4_37.py in transformers 4.37.1, and I got answer like this:
Tried a few parameters: group_size_1=4, group_size_2=1024 group_size_1=4, group_size_2=512
Hi, I happened to test Phi-2 with 4.36.2 today. It seems that Phi-2 itself can do passkey with this version, while it cannot with transformers == 4.38.2. You may consider this.
Thank you for the reply. I am using phi_self_extend_patch_4_37.py in transformers 4.37.1, and I got answer like this:
Tried a few parameters: group_size_1=4, group_size_2=1024 group_size_1=4, group_size_2=512
Hi, I happened to test Phi-2 with 4.36.2 today. It seems that Phi-2 itself can do passkey with this version, while it cannot with transformers == 4.38.2. You may consider this.
This conclusion is misleading. It is one of the variants I tested works well with transformers==4.36.2.
I tested more variants of Phi-2, find that: with transformers==4.38.2, rhysjones/phi-2-orange-v2 works well, while "microsoft/phi-2" cannot work... It's super wired, considering that passkey retrieval is very simple.
All the tested models are vanilla (without SelfExtend). All the input sequences have a length of 1.5k, which is within its context window (2k).
Thank you for the reply. I am using phi_self_extend_patch_4_37.py in transformers 4.37.1, and I got answer like this:
Tried a few parameters: group_size_1=4, group_size_2=1024 group_size_1=4, group_size_2=512
I believe this response will be the last one. I have figured out what happens. Phi-2 is sensitive to the prompt format. Different from other models, Phi-2 requires using the recommended template: "To encourage the model to write more concise answers, you can also try the following QA format using "Instruct:
Could you please release the example script for Phi-2? Thanks.