Xirider / finetune-gpt2xl

Guide: Finetune GPT2-XL (1.5 Billion Parameters) and finetune GPT-NEO (2.7 B) on a single GPU with Huggingface Transformers using DeepSpeed
MIT License
431 stars 73 forks source link

New issue with Pandas #14

Open barakw2021 opened 3 years ago

barakw2021 commented 3 years ago

I got this error:

Traceback (most recent call last): File "run_clm.py", line 478, in main() File "run_clm.py", line 271, in main datasets = load_dataset( File "/root/miniconda3/lib/python3.8/site-packages/datasets/load.py", line 742, in load_dataset builder_instance.download_and_prepare( File "/root/miniconda3/lib/python3.8/site-packages/datasets/builder.py", line 574, in download_and_prepare self._download_and_prepare( File "/root/miniconda3/lib/python3.8/site-packages/datasets/builder.py", line 652, in _download_and_prepare self._prepare_split(split_generator, prepare_split_kwargs) File "/root/miniconda3/lib/python3.8/site-packages/datasets/builder.py", line 1041, in _prepare_split for key, table in utils.tqdm(generator, unit=" tables", leave=False, disable=not_verbose): File "/root/miniconda3/lib/python3.8/site-packages/tqdm/std.py", line 1133, in iter for obj in iterable: File "/root/miniconda3/lib/python3.8/site-packages/datasets/packaged_modules/csv/csv.py", line 92, in _generate_tables csv_file_reader = pd.read_csv( File "/root/miniconda3/lib/python3.8/site-packages/pandas/util/_decorators.py", line 311, in wrapper return func(args, kwargs) File "/root/miniconda3/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 571, in read_csv kwds_defaults = _refine_defaults_read( File "/root/miniconda3/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 1306, in _refine_defaults_read raise ValueError("Specified named and prefix; you can only specify one.") ValueError: Specified named and prefix; you can only specify one. Downloading and preparing dataset csv/default (download: Unknown size, generated: Unknown size, post-processed: Unknown size, total: Unknown size) to /root/.cache/huggingface/datasets/csv/default-84d6151a5e4565ed/0.0.0/2dc6629a9ff6b5697d82c25b73731dd440507a69cbce8b425db50b751e8fcfd0... Traceback (most recent call last): File "run_clm.py", line 478, in main() File "run_clm.py", line 271, in main datasets = load_dataset( File "/root/miniconda3/lib/python3.8/site-packages/datasets/load.py", line 742, in load_dataset builder_instance.download_and_prepare( File "/root/miniconda3/lib/python3.8/site-packages/datasets/builder.py", line 574, in download_and_prepare self._download_and_prepare( File "/root/miniconda3/lib/python3.8/site-packages/datasets/builder.py", line 652, in _download_and_prepare self._prepare_split(split_generator, prepare_split_kwargs) File "/root/miniconda3/lib/python3.8/site-packages/datasets/builder.py", line 1041, in _prepare_split for key, table in utils.tqdm(generator, unit=" tables", leave=False, disable=not_verbose): File "/root/miniconda3/lib/python3.8/site-packages/tqdm/std.py", line 1133, in iter for obj in iterable: File "/root/miniconda3/lib/python3.8/site-packages/datasets/packaged_modules/csv/csv.py", line 92, in _generate_tables csv_file_reader = pd.read_csv( File "/root/miniconda3/lib/python3.8/site-packages/pandas/util/_decorators.py", line 311, in wrapper return func(args, kwargs) File "/root/miniconda3/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 571, in read_csv kwds_defaults = _refine_defaults_read( File "/root/miniconda3/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 1306, in _refine_defaults_read raise ValueError("Specified named and prefix; you can only specify one.") ValueError: Specified named and prefix; you can only specify one. Downloading and preparing dataset csv/default (download: Unknown size, generated: Unknown size, post-processed: Unknown size, total: Unknown size) to /root/.cache/huggingface/datasets/csv/default-84d6151a5e4565ed/0.0.0/2dc6629a9ff6b5697d82c25b73731dd440507a69cbce8b425db50b751e8fcfd0...

Apparently it's a know error with the latest Pandas: https://github.com/pandas-dev/pandas/issues/42387

I solved it by downgrading to Pandas 1.2.5