18907305772 / FuseAI

FuseAI Project
https://huggingface.co/FuseAI
75 stars 33 forks source link

how should I load minipile #15

Closed BlueCestbon closed 3 months ago

BlueCestbon commented 3 months ago

The problem may sound low level, but I did try many ways to solve it

error: raise FileNotFoundError( FileNotFoundError: Directory /FuseAI/data/minipile/data is neither a Dataset directory nor a DatasetDict directory.

I tried to merge parquet into one and then use dataset.load_dataset(), but got an error: File "pyarrow/error.pxi", line 91, in Pyarrow.lib.check_status pyarrow.lib.ArrowCapacityError: array cannot contain more than 2147483646 bytes, have 2149036720