allenai / dolma

Data and tools for generating and inspecting OLMo pre-training data.
https://allenai.github.io/dolma/
Apache License 2.0
1.02k stars 108 forks source link

Fix Tests to pass with new mixer behavior #184

Closed soldni closed 3 months ago

soldni commented 3 months ago

Mixer now keeps attributes field in all cases, even when attributes files contain no attributes (unless attributes are specifically dropped). This helps with consistency, but means tests now fail.