centre-for-humanities-computing / danish-foundation-models

A project for training foundational Danish language model
https://foundationmodels.dk
MIT License
68 stars 4 forks source link

I can't find the raw datasets #286

Open TTTTao725 opened 5 months ago

TTTTao725 commented 5 months ago

I'm trying to run the conversion scripts to correct all the timestamps, but couldn't find the following raw datasets.

TTTTao725 commented 5 months ago
peterbjorgensen commented 5 months ago

@TTTTao725 The raw datasets are on a separate mount. It seems that maybe we should standardise the scripts in how they read from the raw data. Some of the scripts download directly from HuggingFace. In that case I don't think we should store the raw data as well.