datamllab / LongLM

[ICML'24 Spotlight] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
https://arxiv.org/pdf/2401.01325.pdf
MIT License
597 stars 59 forks source link

Differences with ReRoPE #40

Closed siyuanseever closed 4 months ago

siyuanseever commented 4 months ago

Differences with ReRoPE

<!DOCTYPE html>
 
Self-Extend ReRoPE
  tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [1, 0, 0, 0, 0, 0, 0, 0, 0, 0], [2, 1, 0, 0, 0, 0, 0, 0, 0, 0], [3, 2, 1, 0, 0, 0, 0, 0, 0, 0], [4, 3, 2, 1, 0, 0, 0, 0, 0, 0], [4, 4, 3, 2, 1, 0, 0, 0, 0, 0], [5, 5, 4, 3, 2, 1, 0, 0, 0, 0], [5, 5, 4, 4, 3, 2, 1, 0, 0, 0], [6, 6, 5, 5, 4, 3, 2, 1, 0, 0], [6, 6, 5, 5, 4, 4, 3, 2, 1, 0]]) tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [1, 0, 0, 0, 0, 0, 0, 0, 0, 0], [2, 1, 0, 0, 0, 0, 0, 0, 0, 0], [3, 2, 1, 0, 0, 0, 0, 0, 0, 0], [4, 3, 2, 1, 0, 0, 0, 0, 0, 0], [5, 4, 3, 2, 1, 0, 0, 0, 0, 0], [6, 5, 4, 3, 2, 1, 0, 0, 0, 0], [6, 6, 5, 4, 3, 2, 1, 0, 0, 0], [6, 6, 6, 5, 4, 3, 2, 1, 0, 0], [6, 6, 6, 6, 5, 4, 3, 2, 1, 0]])
base 256 : 1.0464 512 : 2.2302 1024 : 2.4575 256 : 1.0464 512 : 0.8970 1024 : 2.1020
+logn 256 : 1.0464 512 : 2.2816 1024 : 2.8442 256 : 1.0464 512 : 0.8832 1024 : 1.8993
+keynorm 256 : 2.1632 512 : 2.5168 1024 : 3.2525 256 : 2.1632 512 : 3.1556 1024 : 4.6036
Mooler0410 commented 4 months ago

You may check this section How to choose the group_size and neighbor_window. It's about how to select the two hyperparameters: group size / window size. Different models may have different empirical rules. But anyway, window_size = training_size/2 is too large.

Rerope is a special case of SelfExtend when the group_size = +∞ (or any large enough value) rather than with window_size = training_size - 1 . Also, You may refer How to choose the group_size and neighbor_window for more results. Some of settings in this section is close to rerope. SelfExtend has superiority. We also have discussion in our paper about the relationship of SelfExtend and existing methods such as T5, iRPE and rerope. You can take a look at it for more details.