issues
search
centre-for-humanities-computing
/
danish-foundation-models
A project for training foundational Danish language model
https://foundationmodels.dk
MIT License
68
stars
4
forks
source link
Add datatrove pipeline blocks
#280
Closed
peterbjorgensen
closed
3 weeks ago
peterbjorgensen
commented
1 month ago
Add data pipeline blocks for datatrove
PII formatter that also removes Danish CPR numbers
Port/wrapper of dolma domain filters to datatrove
adapter function to load dolma data format into datatrove
Add data pipeline blocks for datatrove