FoT can only be used for pre-training, can't it be used for instruction fine-tuning?

CStanKonrad / long_llama

LongLLaMA is a large language model capable of handling long contexts. It is based on OpenLLaMA and fine-tuned with the Focused Transformer (FoT) method.

Apache License 2.0

1.45k stars 85 forks source link

FoT can only be used for pre-training, can't it be used for instruction fine-tuning? #13

Open wujiekd opened 1 year ago

wujiekd commented 1 year ago

I don’t know much about how cross-batch data is loaded during training.