Lightning-AI / litgpt

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
https://lightning.ai
Apache License 2.0
10.86k stars 1.08k forks source link

Continue pre-training got RuntimeError: Failed processing /tmp/data #1413

Open BestJiayi opened 6 months ago

BestJiayi commented 6 months ago

How can I solve this problem?

I download pythia-160m from hugging face.
My data was downloaded according to official documents. This program is running in NVIDIA docker.

I followed the official documentation and continued pre-training, but an error occurred: RuntimeError: Failed processing /tmp/data.

litgpt pretrain \ --model_name pythia-160m \ --tokenizer_dir checkpoints/EleutherAI/pythia-160m \ --initial_checkpoint_dir checkpoints/EleutherAI/pythia-160m \ --data TextFiles \ --data.train_data_path "custom_texts" \ --out_dir out/custom_model

I got: RuntimeError: We found the following error Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/litdata/processing/data_processor.py", line 628, in _handle_data_chunk_recipe for item_data in item_data_or_generator: File "/usr/local/lib/python3.10/dist-packages/litdata/processing/functions.py", line 151, in _prepare_item_generator yield from self._fn(item_metadata) # type: ignore File "/usr/local/lib/python3.10/dist-packages/litgpt/data/text_files.py", line 124, in tokenize with open(filename, "r", encoding="utf-8") as file: IsADirectoryError: [Errno 21] Is a directory: '/tmp/data'

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/litdata/processing/data_processor.py", line 423, in run self._loop() File "/usr/local/lib/python3.10/dist-packages/litdata/processing/data_processor.py", line 472, in _loop self._handle_data_chunk_recipe(index) File "/usr/local/lib/python3.10/dist-packages/litdata/processing/data_processor.py", line 638, in _handle_data_chunk_recipe raise RuntimeError(f"Failed processing {self.items[index]}") from e RuntimeError: Failed processing /tmp/data

carmocca commented 6 months ago

Same issue as in https://github.com/Lightning-AI/litgpt/issues/1402

cc @awaelchli

BestJiayi commented 6 months ago

@carmocca Please tell me, if I want to continue using litgpt for pre-training, what should I do? Should we wait until the bug is fixed before using litgpt? thank you!

carmocca commented 6 months ago

Are you using Google Colab? You could try using https://lightning.ai while this gets fixed. It should work there without issues

BestJiayi commented 6 months ago

Thank you, I am currently using litgpt under our company's gpu cluster. I will wait for the issue to be fixed before continuing to use it.