How to design the starting and ending layers?

yuanrr commented 3 days ago

Great job! I am currently trying your work on the Qwen2.5 model and would like to ask how I decide on the starting and ending layers? How does the end layer, in particular, affect the effect?

1zhou-Wang commented 3 days ago

Hi Bowen,

According to our experimental results, it would be better to trigger the retracing procedure in early layers. For models with 32 layers, it usually would be 5-16 layers, although most retracing procedures happen at layer 7, 8and 9. Follow this idea, I would suggest you apply the same settings to Qwen2.5-7b model, and you may test if such rule still works for the larger ones.

How does the ending layer, in particular, affect the effect? We believe that retracing in deep layers brings worse effect, as the language model has less processing steps dealing with the retraced information. So, the starting layer and ending layer simply serve as a limit to ensure MemVR is triggered within the expected range.

Should you have any questions please feel free to discuss!

yuanrr commented 3 days ago

Hello, thank you for your reply! This helped me a lot! I do have some questions and hope to get your help... I have this question because qwen is a 28-layer model, so it might be a little different. When I used the entropy you designed to observe the uncertainty, I found two phenomena: 1: The first 10 layers will remain above 0.9 2: about the 20th layer will suddenly rise above the threshold, even if the entropy of the previous layer is very small I haven't tried it on a 32-layer model, so I don't know if there's a similar phenomenon. In this case, should I search the start layer >10 and the end layer <20?

1zhou-Wang commented 3 days ago

Hello, thank you for your reply! This helped me a lot! I do have some questions and hope to get your help... I have this question because qwen is a 28-layer model, so it might be a little different. When I used the entropy you designed to observe the uncertainty, I found two phenomena: 1: The first 10 layers will remain above 0.9 2: about the 20th layer will suddenly rise above the threshold, even if the entropy of the previous layer is very small I haven't tried it on a 32-layer model, so I don't know if there's a similar phenomenon. In this case, should I search the start layer >10 and the end layer <20?

Hi! I would recommend you start with layer 5 and see if the result is satisfying. If not, slightly modify the starting layer to deeper layers and try again. Different models do vary in parameter settings, so all you need to do is to conduct some quick evaluations to help you address the optimal values.

But I think starting from layer 10 won't work fine, as that would be too deep for MemVR.

yuanrr commented 3 days ago

Thank you for your help. I will try according to your suggestion.

1zhou-Wang / MemVR

How to design the starting and ending layers? #3