💡 [REQUEST] - <title>如何在Qwen14B和72B中应用SelfExtend

ArcherShirou commented 6 months ago

In the paper LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning, the authors describe a method to extend the context-window of any rope-based model without fine-tuning at inference time. The results reported in the paper seem game-changing.I apply it in Qwen by myself, but I failed to modify modeling_qwen.py. How could we add support for this in Qwen?

q, k, v # queries, keys, and values
seq_len, pos # input sequence length, position_idx
g_size, w_size = G, w_n

# normal self-attention
ngb_q = apply_pos_emcode(q, pos)
ngb_k = apply_pos_emcode(k, pos)
ngb_attn = matmul(ngb_q, ngb_k)
ngb_attn = causal_mask(ngb_attn)

# grouped self-attention
g_pos = pos // g_size # the floor operation
shift = w_size - w_size // g_size
s_g_pos = g_pos + shift
g_q = apply_pos_emcode(q, s_g_pos)
g_k = apply_pos_emcode(k, g_pos)
g_attn = matmul(g_q, g_k)
g_attn = causal_mask(g_attn)

g_mask = tril(ones([seq_len-w_size, seq_len-w_size]))
mask = ones([seq_len, seq_len])
mask[w_size:, :-w_size] -= g_mask

attn = where(mask, ngb_attn, g_attn) # merge by replacement

attn_weights = softmax(attn)
output = matmul(attn_weights, v)

WeixuanXiong commented 4 months ago

the same question!

Mooler0410 commented 4 months ago

We recently added supported to QWen1.5! Have a try!

github-actions[bot] commented 3 months ago

This issue has been automatically marked as inactive due to lack of recent activity. Should you believe it remains unresolved and warrants attention, kindly leave a comment on this thread. 此问题由于长期未有新进展而被系统自动标记为不活跃。如果您认为它仍有待解决，请在此帖下方留言以补充信息。

QwenLM / Qwen

💡 [REQUEST] - <title>如何在Qwen14B和72B中应用SelfExtend #937