ccdv-ai / convert_checkpoint_to_lsg

Efficient Attention for Long Sequence Processing
MIT License
89 stars 11 forks source link

Convert llama to Long llama #5

Open jakebonk opened 1 year ago

jakebonk commented 1 year ago

Do you have any thoughts on converting llama/alpaca/vicuna models with LSG? Or would this be more difficult. It looks like they use rotary positional embeddings (RoPE) instead of absolute positional embeddings

ccdv-ai commented 1 year ago

Hi @jakebonk

HF team added the Llama model few days ago. From what I see in this implementation it is likely possible to add the LSG attention to a Llama model. RoPE isn't a problem as you can apply it before computing the score matrix.

I need to investigate this, I'll let you know within the week if I can create a conversion script.