I was wondering if it'd be possible to upload the tokenized dataset. I tried following the instructions under the Pretraining header but had trouble installing Megablocks due to a CUDA version mismatch. Anyway, I think it would be very helpful to upload the tokenized dataset to Huggingface to save others the work.
I was wondering if it'd be possible to upload the tokenized dataset. I tried following the instructions under the Pretraining header but had trouble installing Megablocks due to a CUDA version mismatch. Anyway, I think it would be very helpful to upload the tokenized dataset to Huggingface to save others the work.