Closed epwalsh closed 1 month ago
Adds support for document masking during training via flash-attn. This is activated when the flag --data.generate_doc_lengths is set. The code changes were adapted from https://github.com/yuzhaouoe/pretraining-data-packing.
--data.generate_doc_lengths
Adds support for document masking during training via flash-attn. This is activated when the flag
--data.generate_doc_lengths
is set. The code changes were adapted from https://github.com/yuzhaouoe/pretraining-data-packing.