Modalities / modalities

Modalities, a PyTorch-native framework for distributed and reproducible foundation model training.
MIT License
63 stars 8 forks source link

Packing for SFT data #226

Open lllAlexanderlll opened 3 months ago

lllAlexanderlll commented 3 months ago

Feature request

Pack SFT data samples that fit into one context size together. See similarity to #187

Motivation

Currently, we pad and truncate instruction-tuning examples to the whole context size. Packaging conversations that fit fully in one context together would be more computationally efficient.