Pack SFT data samples that fit into one context size together. See similarity to #187
Motivation
Currently, we pad and truncate instruction-tuning examples to the whole context size. Packaging conversations that fit fully in one context together would be more computationally efficient.
Feature request
Pack SFT data samples that fit into one context size together. See similarity to #187
Motivation
Currently, we pad and truncate instruction-tuning examples to the whole context size. Packaging conversations that fit fully in one context together would be more computationally efficient.