It seems like dummy_converstions in the repository is far shorter than 16k, I'm wondering how are these short conversation data able to fine-tune 16k long context models?
@Arist12 Dummy data is not for training; rather for testing whether you setup the pipeline correctly. In actual training, you will put long conversations, codes, etc.
It seems like dummy_converstions in the repository is far shorter than 16k, I'm wondering how are these short conversation data able to fine-tune 16k long context models?