Closed le1nux closed 1 week ago
fixes #163
Yes, the inheritance structure can be improved. I suggest we do this in a separate PR together with improving the "packing" terms in those cases when there is no actual packing happening.
I added the issue https://github.com/Modalities/modalities/issues/167 for addressing this.
What does this PR do?
The index values in the pbin files had the wrong values. They did start with an offset and additionally, we added another offset of HEADER size when reading from the file buffer. See here for the initial offset during pbin index creation: https://github.com/Modalities/modalities/blob/4aa2e88efe13c3eaab4c6b425fdb82caf0d2a443/src/modalities/dataloader/create_packed_data.py#L145
and the additional offset that is used when reading from the memmap during training:
https://github.com/Modalities/modalities/blob/4aa2e88efe13c3eaab4c6b425fdb82caf0d2a443/src/modalities/dataloader/create_packed_data.py#L262
This PR fixes this issue and makes the index always start at byte 0, only applying the offset once when reading from the memmap file.
General changes
block_size
from abstract classes that don't need to see theblock_size
conceptBreaking Changes
Checklist before submitting final PR
python tests/tests.py
)