CStanKonrad / long_llama

LongLLaMA is a large language model capable of handling long contexts. It is based on OpenLLaMA and fine-tuned with the Focused Transformer (FoT) method.
Apache License 2.0
1.44k stars 87 forks source link

Would LongNet be easily applied to the attention with FoT #3

Open jebarpg opened 1 year ago

jebarpg commented 1 year ago

https://arxiv.org/abs/2307.02486 Scaling to 1 billion context length paper in addition to this seems like it would solve the pursuit of infinite context length. Also FoT feels similar to L2P learn to prompt which integrates a pool of prompts to help get over the forgetful issues while applying continuous learning to a model... Maybe there could be both the database of kvs accessed via knn that blends well also with L2P... Plus the LongNet dilation algorithm could definitely benefit from contrast learning too.

Thoughts?

syzymon commented 1 year ago

Hi, thanks for your interest in our work! From my understanding of the LongNet paper, the main idea of FoT which is training on negative examples while utilizing longer context, and the dilated attention from LongNet seem pretty orthogonal, which would make combining these two methods an interesting research direction to explore!