ggerganov / ggml

Tensor library for machine learning
MIT License
11.26k stars 1.05k forks source link

ggml : adjust corr_dims[0]/low rope dimension #913

Open danbev opened 3 months ago

danbev commented 3 months ago

This commit adjusts the low dimension of the rope to be used in the ramp calculation to account for the dimension index starting at 0.

The motivation for this changes is that currently, if for example the low dimension is 20, the actual dimension where the range will start will be 21. I'm not sure if this is intended or not, but I wanted to raise this issue/question just in case.

For example, printing some debugging information in the rope_yarn_ramp function we can see the following:

corr_dims[0] = 20 corr_dims[1] = 46
0  i0 = 0,  theta = 0.000000, ramp_mix = 1.000000
1  i0 = 2,  theta = 0.000000, ramp_mix = 1.000000
2  i0 = 4,  theta = 0.000000, ramp_mix = 1.000000
3  i0 = 6,  theta = 0.000000, ramp_mix = 1.000000
4  i0 = 8,  theta = 0.000000, ramp_mix = 1.000000
5  i0 = 10, theta = 0.000000, ramp_mix = 1.000000
6  i0 = 12, theta = 0.000000, ramp_mix = 1.000000
7  i0 = 14, theta = 0.000000, ramp_mix = 1.000000
8  i0 = 16, theta = 0.000000, ramp_mix = 1.000000
9  i0 = 18, theta = 0.000000, ramp_mix = 1.000000
10 i0 = 20, theta = 0.000000, ramp_mix = 1.000000
11 i0 = 22, theta = 0.000000, ramp_mix = 1.000000
12 i0 = 24, theta = 0.000000, ramp_mix = 1.000000
13 i0 = 26, theta = 0.000000, ramp_mix = 1.000000
14 i0 = 28, theta = 0.000000, ramp_mix = 1.000000
15 i0 = 30, theta = 0.000000, ramp_mix = 1.000000
16 i0 = 32, theta = 0.000000, ramp_mix = 1.000000
17 i0 = 34, theta = 0.000000, ramp_mix = 1.000000
18 i0 = 36, theta = 0.000000, ramp_mix = 1.000000
19 i0 = 38, theta = 0.000000, ramp_mix = 1.000000
20 i0 = 40, theta = 0.000000, ramp_mix = 1.000000
21 i0 = 42, theta = 0.000000, ramp_mix = 0.961538 <--- low
22 i0 = 44, theta = 0.000000, ramp_mix = 0.923077
23 i0 = 46, theta = 0.000000, ramp_mix = 0.884615
24 i0 = 48, theta = 0.000000, ramp_mix = 0.846154
25 i0 = 50, theta = 0.000000, ramp_mix = 0.807692
26 i0 = 52, theta = 0.000000, ramp_mix = 0.769231
27 i0 = 54, theta = 0.000000, ramp_mix = 0.730769
28 i0 = 56, theta = 0.000000, ramp_mix = 0.692308
29 i0 = 58, theta = 0.000000, ramp_mix = 0.653846
30 i0 = 60, theta = 0.000000, ramp_mix = 0.615385
31 i0 = 62, theta = 0.000000, ramp_mix = 0.576923
32 i0 = 64, theta = 0.000000, ramp_mix = 0.538462
33 i0 = 66, theta = 0.000000, ramp_mix = 0.500000
34 i0 = 68, theta = 0.000000, ramp_mix = 0.461538
35 i0 = 70, theta = 0.000000, ramp_mix = 0.423077
36 i0 = 72, theta = 0.000000, ramp_mix = 0.384615
37 i0 = 74, theta = 0.000000, ramp_mix = 0.346154
38 i0 = 76, theta = 0.000000, ramp_mix = 0.307692
39 i0 = 78, theta = 0.000000, ramp_mix = 0.269231
40 i0 = 80, theta = 0.000000, ramp_mix = 0.230769
41 i0 = 82, theta = 0.000000, ramp_mix = 0.192308
42 i0 = 84, theta = 0.000000, ramp_mix = 0.153846
43 i0 = 86, theta = 0.000000, ramp_mix = 0.115385
44 i0 = 88, theta = 0.000000, ramp_mix = 0.076923
45 i0 = 90, theta = 0.000000, ramp_mix = 0.038462
46 i0 = 92, theta = 0.000000, ramp_mix = 0.000000 <--- high
                   ...

With the change in this commit this will be adjusted to:

0  i0 = 0,  theta = 0.000000, ramp_mix = 1.000000
1  i0 = 2,  theta = 0.000000, ramp_mix = 1.000000
2  i0 = 4,  theta = 0.000000, ramp_mix = 1.000000
3  i0 = 6,  theta = 0.000000, ramp_mix = 1.000000
4  i0 = 8,  theta = 0.000000, ramp_mix = 1.000000
5  i0 = 10, theta = 0.000000, ramp_mix = 1.000000
6  i0 = 12, theta = 0.000000, ramp_mix = 1.000000
7  i0 = 14, theta = 0.000000, ramp_mix = 1.000000
8  i0 = 16, theta = 0.000000, ramp_mix = 1.000000
9  i0 = 18, theta = 0.000000, ramp_mix = 1.000000
10 i0 = 20, theta = 0.000000, ramp_mix = 1.000000
11 i0 = 22, theta = 0.000000, ramp_mix = 1.000000
12 i0 = 24, theta = 0.000000, ramp_mix = 1.000000
13 i0 = 26, theta = 0.000000, ramp_mix = 1.000000
14 i0 = 28, theta = 0.000000, ramp_mix = 1.000000
15 i0 = 30, theta = 0.000000, ramp_mix = 1.000000
16 i0 = 32, theta = 0.000000, ramp_mix = 1.000000
17 i0 = 34, theta = 0.000000, ramp_mix = 1.000000
18 i0 = 36, theta = 0.000000, ramp_mix = 1.000000
19 i0 = 38, theta = 0.000000, ramp_mix = 1.000000
20 i0 = 40, theta = 0.000000, ramp_mix = 0.962963 <-- low
21 i0 = 42, theta = 0.000000, ramp_mix = 0.925926
22 i0 = 44, theta = 0.000000, ramp_mix = 0.888889
23 i0 = 46, theta = 0.000000, ramp_mix = 0.851852
24 i0 = 48, theta = 0.000000, ramp_mix = 0.814815
25 i0 = 50, theta = 0.000000, ramp_mix = 0.777778
26 i0 = 52, theta = 0.000000, ramp_mix = 0.740741
27 i0 = 54, theta = 0.000000, ramp_mix = 0.703704
28 i0 = 56, theta = 0.000000, ramp_mix = 0.666667
29 i0 = 58, theta = 0.000000, ramp_mix = 0.629630
30 i0 = 60, theta = 0.000000, ramp_mix = 0.592593
31 i0 = 62, theta = 0.000000, ramp_mix = 0.555556
32 i0 = 64, theta = 0.000000, ramp_mix = 0.518519
33 i0 = 66, theta = 0.000000, ramp_mix = 0.481481
34 i0 = 68, theta = 0.000000, ramp_mix = 0.444444
35 i0 = 70, theta = 0.000000, ramp_mix = 0.407407
36 i0 = 72, theta = 0.000000, ramp_mix = 0.370370
37 i0 = 74, theta = 0.000000, ramp_mix = 0.333333
38 i0 = 76, theta = 0.000000, ramp_mix = 0.296296
39 i0 = 78, theta = 0.000000, ramp_mix = 0.259259
40 i0 = 80, theta = 0.000000, ramp_mix = 0.222222
41 i0 = 82, theta = 0.000000, ramp_mix = 0.185185
42 i0 = 84, theta = 0.000000, ramp_mix = 0.148148
43 i0 = 86, theta = 0.000000, ramp_mix = 0.111111
44 i0 = 88, theta = 0.000000, ramp_mix = 0.074074
45 i0 = 90, theta = 0.000000, ramp_mix = 0.037037
46 i0 = 92, theta = 0.000000, ramp_mix = 0.000000 <-- high
                 ...

This will make the ranges start at the correct dimensions (assuming that the change in this commit are correct).