The problem may sound low level, but I did try many ways to solve it
error: raise FileNotFoundError(
FileNotFoundError: Directory /FuseAI/data/minipile/data is neither a Dataset directory nor a DatasetDict directory.
I tried to merge parquet into one and then use dataset.load_dataset(), but got an error:
File "pyarrow/error.pxi", line 91, in Pyarrow.lib.check_status
pyarrow.lib.ArrowCapacityError: array cannot contain more than 2147483646 bytes, have 2149036720
The problem may sound low level, but I did try many ways to solve it
error: raise FileNotFoundError( FileNotFoundError: Directory /FuseAI/data/minipile/data is neither a
Dataset
directory nor aDatasetDict
directory.I tried to merge parquet into one and then use dataset.load_dataset(), but got an error: File "pyarrow/error.pxi", line 91, in Pyarrow.lib.check_status pyarrow.lib.ArrowCapacityError: array cannot contain more than 2147483646 bytes, have 2149036720