datamllab / LongLM

[ICML'24 Spotlight] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
https://arxiv.org/pdf/2401.01325.pdf
MIT License
549 stars 54 forks source link

Support for Phi2 / Mixformer #18

Closed anthony-chaudhary closed 4 months ago

anthony-chaudhary commented 4 months ago

Great work!!

It would be super to have deeper support for Phi2 / Mixformer, e.g. https://huggingface.co/amgadhasan/phi-2

Edit: Tried the existing phi patches with some modifications, but it seems like some core assumptions are pretty different. e.g. AttributeError: 'MixFormerSequentialForCausalLM' object has no attribute 'q_proj'

Mooler0410 commented 4 months ago

Hi! We currently have no plan for Mixformer. To use Phi-2, instead, you may try to use susnato/Phi-2 for transformers 4.36 and use Microsoft's official microsoft/phi-2 for transformers >= 4.37.