EleutherAI / pythia

The hub for EleutherAI's work on interpretability and learning dynamics
Apache License 2.0
2.23k stars 165 forks source link

Error when running unshard_memmap.py #114

Closed ShaneeyS closed 11 months ago

ShaneeyS commented 1 year ago

Hi, when i try to run the following command:

python utils/unshard_memmap.py --input_file ./pythia_deduped_pile_idxmaps/pile_0.87_deduped_text_document-00000-of-00082.bin --num_shards 83 --output_dir ./pythia_pile_idxmaps/

an error always raises:

pythia_deduped_pile_idxmaps/pile_0.87_deduped_text_document-00023-of-00082.bin 29%|?????????????????????????????????????????????????????????? | 24/83 [6:09:46<15:01:09, 916.43s/it]pythia_deduped_pile_idxmaps/pile_0.87_deduped_text_document-00024-of-00082.bin 30%|????????????????????????????????????????????????????????????? | 25/83 [6:25:14<14:49:06, 919.76s/it]pythia_deduped_pile_idxmaps/pile_0.87_deduped_text_document-00025-of-00082.bin 31%|??????????????????????????????????????????????????????????????? | 26/83 [6:40:51<14:38:46, 925.03s/it]pythia_deduped_pile_idxmaps/pile_0.87_deduped_text_document-00026-of-00082.bin 33%|?????????????????????????????????????????????????????????????????? | 27/83 [6:56:36<14:28:56, 931.02s/it]pythia_deduped_pile_idxmaps/pile_0.87_deduped_text_document-00027-of-00082.bin 34%|???????????????????????????????????????????????????????????????????? | 28/83 [7:12:14<14:15:21, 933.12s/it]pythia_deduped_pile_idxmaps/pile_0.87_deduped_text_document-00028-of-00082.bin 35%|??????????????????????????????????????????????????????????????????????? | 29/83 [7:28:12<14:06:25, 940.47s/it]pythia_deduped_pile_idxmaps/pile_0.87_deduped_text_document-00029-of-00082.bin 36%|????????????????????????????????????????????????????????????????????????? | 30/83 [7:44:13<13:56:16, 946.72s/it]pythia_deduped_pile_idxmaps/pile_0.87_deduped_text_document-00030-of-00082.bin 37%|??????????????????????????????????????????????????????????????????????????? | 31/83 [8:00:12<13:43:41, 950.42s/it]pythia_deduped_pile_idxmaps/pile_0.87_deduped_text_document-00031-of-00082.bin Bus error (core dumped)

Could you please tell me how to solve this problem?

Lisennlp commented 11 months ago

I also encountered this problem, it was caused by insufficient hard disk space.

StellaAthena commented 11 months ago

The dataset is very big and unpacking it requires even more space. We recommend using a drive with 2 TB of available space (the final product takes up about 1.6 TB).