Closed DavideHe closed 6 months ago
position ids is [0,1,2...l-1]
and chunk_len
is s. So the new position ids for keys are position_ids % chunk_len
. Please feel free to discuss with us if we misunderstand your question.
position ids is
[0,1,2...l-1]
andchunk_len
is s. So the new position ids for keys areposition_ids % chunk_len
. Please feel free to discuss with us if we misunderstand your question.
but In the paper , Formula 2 show the Pintra = Pk = [0, 1, . . . , l − 1] mod s. In the code , key_states = apply_rotary_pos_emb(key_states, k_cos, k_sin, position_ids) . the position_ids
is not moded yet before sending to apply_rotary_pos_emb.
In my thought, Pk will not change with diffrent chunks and keep 0 - s Periodically,Pq will change in diffrent chunks.
Your understanding is right! So I guess you must forget to read this line.
I mean that key_states = apply_rotary_pos_emb(key_states, k_cos, k_sin, position_ids % chunk_len)
Sure! This will have the effect, but you have to re-initialized k_t
. In our implementation, this line have already %chunk_len
in the RoPE embedding part.
Sure! This will have the effect, but you have to re-initialized
k_t
. In our implementation, this line have already%chunk_len
in the RoPE embedding part.
I understand, thank yout replying. In my thought, I don't want to initialize dobule cos sin. I overlook the self.rotary_emb
becuase of no q , k sending in it .
https://github.com/HKUNLP/ChunkLlama/blob/main/chunkllama_attn_replace.py#L133 In the paper ,Pk = [0, 1, . . . , l − 1] mod s But in the code ,
If I understand wrong,Please point out my mistake