Glaciohound / LM-Infinite

Implementation of NAACL 2024 Outstanding Paper "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"
https://arxiv.org/abs/2308.16137
MIT License
112 stars 13 forks source link

GPTNeoX or Transformers support? #1

Closed fblgit closed 11 months ago

fblgit commented 11 months ago

I'm trying to integrate LM-Infinite into GPTNeoX pythia-dedup. I managed to bring in the lambda_attn to work, but the rotary's implementation on the GPTNeoX is a bit different, and the heads is a 3 * hidden_size to form QKV, and the other model has separated layers of 1 * hidden_size that are independent Q/K/V. It managed to train, but during inference or evaluation (single batch) I got stuck on some shape mismatch.

I did managed to see the training benefit of lambda_attn, with a higher it/s. The GPU metrics are more smooth and steady on high throughput. The CPU exhibits also higher compute demand compared to traditional training and it doesn't appear to show any contention for the training. As a test, I did managed to train a larger context with the same hardware and at a higher performance, this works obviously.

Perhaps I was thinking wether having a folder or a separate repo with these modeling_$model.py that can be fit into transformers, would help to simplify the setup and adoption?

Glaciohound commented 11 months ago

Hello! It is a pleasure to see that this repo is of interest and help to you in some ways.

As for your suggestions, I am not sure if I get it. Are you suggesting that, instead of the current implementation of hijacking the attn.forward functions after loading models, we should implement a whole separate modeling_$model.py which contains the complete model and layer classes? I am afraid that this would increase the code overheads, especially when the modeling_$model.py files in the original Transformer library have newer versions.

Nevertheless, I strongly encourage you to play with the codes by yourself. If you do demonstrate that this code architecture has more benefits I am happy to adopt that! In recent weeks I might be too busy to maintain the repo myself (due to lots of deadlines) but will look forward to your updates.

fblgit commented 11 months ago

It is easy to pip install from a repo the transformers library with the changes on it, also easier to port. But this way also works. Just to note that inside GPT_J_Mode there is another model object with the model. Will keep testing and let you know, thanks.