Previously, the block_size in the dataset would be set to a power of two, resulting in the sequence length being block_size -1, which is not best practice and can impact the model training e.g., throughput-wise.
As a fix, we now specify the sequence_length in the config instead of the block_size. During Dataset instantiation we chose the block_size to be sequence_length+1.
Previously, we would also chunk the dataset into block_size long chunks. Each chunk would then be used for training individually. As a result, the last token of a block would be only used as a target but never as an input. We changed this, such that we reuse the last token of a batch as the first one of the subsequent batch.
General changes
nothing apart from points mentioned above
Breaking Changes
replaced block_size in Dataset, Model and NumberConversion with sequence_length
Checklist before submitting final PR
[x] My PR is minimal and addresses one issue / enhancement in isolation
[ ] I have merged main into this feature branch
[x] I have reviewed my own code w.r.t. correct implementation, missing type hints, proper documentation, etc.
[x] I have run a sample config for model training
[x] I have fixed all failing tests (python tests/tests.py)
What does this PR do?
Previously, the
block_size
in the dataset would be set to a power of two, resulting in the sequence length beingblock_size -1
, which is not best practice and can impact the model training e.g., throughput-wise.As a fix, we now specify the
sequence_length
in the config instead of theblock_size
. During Dataset instantiation we chose theblock_size
to besequence_length+1
.Previously, we would also chunk the dataset into
block_size
long chunks. Each chunk would then be used for training individually. As a result, the last token of a block would be only used as a target but never as an input. We changed this, such that we reuse the last token of a batch as the first one of the subsequent batch.General changes
Breaking Changes
block_size
inDataset
,Model
andNumberConversion
withsequence_length
Checklist before submitting final PR
python tests/tests.py
)