Open AshwinRamachandran2002 opened 11 months ago
Hi @AshwinRamachandran2002 , Thank you for your interest in our work!
Your reading is correct, and you are looking at the right places in the code:
These are the settings that we found to work the best in our initial experiments. I agree that it may not be optimal. But I can't say whether "the current hidden state gives more attention to the tokens somewhere in the middle of the prompt and then decays both to the right and left" - it's an hypothesis that is worth checking, and possibly fixing (and writing a paper about if you manage to do that :-) )
Please let us know if you have any questions! Uri
Thank you for your reply I would like to also know how decided upon the vectorstore query
https://github.com/abertsch72/unlimiformer/blob/232fc235706c304667f7a671cca2203d4625eaa1/src/unlimiformer.py#L1098 You have used an approximation to the R(m) * W_k as W_k + Rotated(W_k)
Did you also consider dropping R(m) ?
Hi, I was going through your code to know how you calculated the RoPE embeddings and need a clarification
In assigning a relative position to a newly generated token, the base reference is taken as the end of the prompt input https://github.com/abertsch72/unlimiformer/blob/232fc235706c304667f7a671cca2203d4625eaa1/src/unlimiformer.py#L1084C10-L1084C10
In assigning a relative position to the retrieved key indices the relative position is taken as the start of the prompt input https://github.com/abertsch72/unlimiformer/blob/232fc235706c304667f7a671cca2203d4625eaa1/src/unlimiformer.py#L1123
Then would it not be the case that the current hidden state gives more attention to the tokens somewhere in the middle of the prompt and then decays both to the right and left?
Thank you Ashwin Ramachandran