VITA-Group / Ms-PoE

"Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding" Zhenyu Zhang, Runjin Chen, Shiwei Liu, Zhewei Yao, Olatunji Ruwase, Beidi Chen, Xiaoxia Wu, Zhangyang Wang.
MIT License
21 stars 2 forks source link

Problems in _head_wise_statistics() function #4

Open JingfenQiao opened 4 months ago

JingfenQiao commented 4 months ago

File "Ms-PoE/utils/modify_arch/llama.py", line 330, in forward self.head_order = self._head_wise_statistics(query_states, key_states, q_len, kv_seq_len, bsz, attention_mask) File "Ms-PoE/utils/modify_arch/llama.py", line 176, in _head_wise_statistics raise ValueError( ValueError: Attention weights should be of size (1, 32, 1793, 3586), but is torch.Size([1, 32, 1793, 1793])

The above problem is caused by the following code in llama.py. The new seq_len in attn_weights is not equal to kv_seq_len.

if attn_weights.size() != (bsz, self.num_heads, q_len, kv_seq_len): raise ValueError( f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is" f" {attn_weights.size()}" )

Chungyezun commented 2 weeks ago

Hello, I'm facing a similar issue. Did you manage to find a solution for this?