Open jakebonk opened 1 year ago
Hi @jakebonk
HF team added the Llama model few days ago. From what I see in this implementation it is likely possible to add the LSG attention to a Llama model. RoPE isn't a problem as you can apply it before computing the score matrix.
I need to investigate this, I'll let you know within the week if I can create a conversion script.
Do you have any thoughts on converting llama/alpaca/vicuna models with LSG? Or would this be more difficult. It looks like they use rotary positional embeddings (RoPE) instead of absolute positional embeddings