LongLLaMA is a large language model capable of handling long contexts. It is based on OpenLLaMA and fine-tuned with the Focused Transformer (FoT) method.
Apache License 2.0
1.45k
stars
85
forks
source link
FoT can only be used for pre-training, can't it be used for instruction fine-tuning? #13
I don’t know much about how cross-batch data is loaded during training.