Modalities / modalities

A framework for training multimodal foundation models.
MIT License
38 stars 3 forks source link

Fix/sequence length power of 2 #158

Closed le1nux closed 1 week ago

le1nux commented 3 weeks ago

What does this PR do?

Previously, the block_size in the dataset would be set to a power of two, resulting in the sequence length being block_size -1, which is not best practice and can impact the model training e.g., throughput-wise.

As a fix, we now specify the sequence_length in the config instead of the block_size. During Dataset instantiation we chose the block_size to be sequence_length+1.

Previously, we would also chunk the dataset into block_size long chunks. Each chunk would then be used for training individually. As a result, the last token of a block would be only used as a target but never as an input. We changed this, such that we reuse the last token of a batch as the first one of the subsequent batch.

General changes

Breaking Changes

Checklist before submitting final PR