Closed KennethEnevoldsen closed 5 months ago
Have we considered: https://huggingface.co/datasets/uonlp/CulturaX
Seems to be mostly common crawl content and could imagine colossal C4 is probably a better solution.
@peterbjorgensen I am assuming we have. If that is the case we can simply close this issue (just so that it is here in the future).
@peterbjorgensen assuming we don't want to include this. So will close it
Have we considered: https://huggingface.co/datasets/uonlp/CulturaX
Seems to be mostly common crawl content and could imagine colossal C4 is probably a better solution.
@peterbjorgensen I am assuming we have. If that is the case we can simply close this issue (just so that it is here in the future).