Xirider / finetune-gpt2xl

Guide: Finetune GPT2-XL (1.5 Billion Parameters) and finetune GPT-NEO (2.7 B) on a single GPU with Huggingface Transformers using DeepSpeed
MIT License
428 stars 73 forks source link

TypeError: __init__() got an unexpected keyword argument 'no_args_is_help' #24

Open SeekPoint opened 1 year ago

SeekPoint commented 1 year ago

(gh_finetune-gpt2xl) r730ub20@r730ub20-M0:~/llm_dev/finetune-gpt2xl$ deepspeed --num_gpus=1 run_clm.py --deepspeed ds_config.json --model_name_or_path gpt2-xl --train_file train.csv --validation_file validation.csv --do_train --do_eval --fp16 --overwrite_cache --evaluation_strategy="steps" --output_dir finetuned --eval_steps 200 --num_train_epochs 1 --gradient_accumulation_steps 2 --per_device_train_batch_size 1 [2023-05-22 22:00:31,576] [WARNING] [runner.py:191:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only. [2023-05-22 22:00:31,600] [INFO] [runner.py:541:main] cmd = /usr/bin/python3 -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None run_clm.py --deepspeed ds_config.json --model_name_or_path gpt2-xl --train_file train.csv --validation_file validation.csv --do_train --do_eval --fp16 --overwrite_cache --evaluation_strategy=steps --output_dir finetuned --eval_steps 200 --num_train_epochs 1 --gradient_accumulation_steps 2 --per_device_train_batch_size 1 [2023-05-22 22:00:33,028] [INFO] [launch.py:229:main] WORLD INFO DICT: {'localhost': [0]} [2023-05-22 22:00:33,028] [INFO] [launch.py:235:main] nnodes=1, num_local_procs=1, node_rank=0 [2023-05-22 22:00:33,028] [INFO] [launch.py:246:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0]}) [2023-05-22 22:00:33,028] [INFO] [launch.py:247:main] dist_world_size=1 [2023-05-22 22:00:33,028] [INFO] [launch.py:249:main] Setting CUDA_VISIBLE_DEVICES=0 [2023-05-22 22:00:34,832] [INFO] [comm.py:622:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl 05/22/2023 22:00:34 - WARNING - main - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: True 05/22/2023 22:00:34 - INFO - main - Training/evaluation parameters TrainingArguments( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_find_unused_parameters=None, debug=[], deepspeed=ds_config.json, disable_tqdm=False, do_eval=True, do_predict=False, do_train=True, eval_accumulation_steps=None, eval_steps=200, evaluation_strategy=IntervalStrategy.STEPS, fp16=True, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, gradient_accumulation_steps=2, greater_is_better=None, group_by_length=False, ignore_data_skip=False, label_names=None, label_smoothing_factor=0.0, learning_rate=5e-05, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_on_each_node=True, logging_dir=runs/May22_22-00-34_r730ub20-M0, logging_first_step=False, logging_steps=500, logging_strategy=IntervalStrategy.STEPS, lr_scheduler_type=SchedulerType.LINEAR, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mp_parameters=, no_cuda=False, num_train_epochs=1.0, output_dir=finetuned, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=8, per_device_train_batch_size=1, prediction_loss_only=False, push_to_hub=False, remove_unused_columns=True, report_to=['wandb'], resume_from_checkpoint=None, run_name=finetuned, save_steps=500, save_strategy=IntervalStrategy.STEPS, save_total_limit=None, seed=42, sharded_ddp=[], skip_memory_metrics=True, tpu_metrics_debug=False, tpu_num_cores=None, use_legacy_prediction_loop=False, warmup_ratio=0.0, warmup_steps=0, weight_decay=0.0, ) 05/22/2023 22:00:36 - WARNING - datasets.builder - Using custom data configuration default-3bfffae691dad1b0 05/22/2023 22:00:36 - WARNING - datasets.builder - Reusing dataset csv (/home/r730ub20/.cache/huggingface/datasets/csv/default-3bfffae691dad1b0/0.0.0/2dc6629a9ff6b5697d82c25b73731dd440507a69cbce8b425db50b751e8fcfd0) [INFO|configuration_utils.py:517] 2023-05-22 22:00:36,541 >> loading configuration file https://huggingface.co/gpt2-xl/resolve/main/config.json from cache at /home/r730ub20/.cache/huggingface/transformers/d2de8fec009fa9b9196047559bcac6c1f02a9c500718b4346bc516354965b1ca.d684cb2afa3f8c44c73bd67537d9aa5ff6044658793e077d7306ef2e37dd79bd [INFO|configuration_utils.py:553] 2023-05-22 22:00:36,543 >> Model config GPT2Config { "activation_function": "gelu_new", "architectures": [ "GPT2LMHeadModel" ], "attn_pdrop": 0.1, "bos_token_id": 50256, "embd_pdrop": 0.1, "eos_token_id": 50256, "gradient_checkpointing": false, "initializer_range": 0.02, "layer_norm_epsilon": 1e-05, "model_type": "gpt2", "n_ctx": 1024, "n_embd": 1600, "n_head": 25, "n_inner": null, "n_layer": 48, "n_positions": 1024, "output_past": true, "resid_pdrop": 0.1, "scale_attn_weights": true, "summary_activation": null, "summary_first_dropout": 0.1, "summary_proj_to_labels": true, "summary_type": "cls_index", "summary_use_proj": true, "task_specific_params": { "text-generation": { "do_sample": true, "max_length": 50 } }, "transformers_version": "4.7.0", "use_cache": true, "vocab_size": 50257 }

[INFO|configuration_utils.py:517] 2023-05-22 22:00:36,953 >> loading configuration file https://huggingface.co/gpt2-xl/resolve/main/config.json from cache at /home/r730ub20/.cache/huggingface/transformers/d2de8fec009fa9b9196047559bcac6c1f02a9c500718b4346bc516354965b1ca.d684cb2afa3f8c44c73bd67537d9aa5ff6044658793e077d7306ef2e37dd79bd [INFO|configuration_utils.py:553] 2023-05-22 22:00:36,954 >> Model config GPT2Config { "activation_function": "gelu_new", "architectures": [ "GPT2LMHeadModel" ], "attn_pdrop": 0.1, "bos_token_id": 50256, "embd_pdrop": 0.1, "eos_token_id": 50256, "gradient_checkpointing": false, "initializer_range": 0.02, "layer_norm_epsilon": 1e-05, "model_type": "gpt2", "n_ctx": 1024, "n_embd": 1600, "n_head": 25, "n_inner": null, "n_layer": 48, "n_positions": 1024, "output_past": true, "resid_pdrop": 0.1, "scale_attn_weights": true, "summary_activation": null, "summary_first_dropout": 0.1, "summary_proj_to_labels": true, "summary_type": "cls_index", "summary_use_proj": true, "task_specific_params": { "text-generation": { "do_sample": true, "max_length": 50 } }, "transformers_version": "4.7.0", "use_cache": true, "vocab_size": 50257 }

[INFO|tokenization_utils_base.py:1717] 2023-05-22 22:00:39,950 >> loading file https://huggingface.co/gpt2-xl/resolve/main/vocab.json from cache at /home/r730ub20/.cache/huggingface/transformers/8560a2df03f812b276794ae6935255d0590522553a4c8103155472b07591a21b.c7ed1f96aac49e745788faa77ba0a26a392643a50bb388b9c04ff469e555241f [INFO|tokenization_utils_base.py:1717] 2023-05-22 22:00:39,950 >> loading file https://huggingface.co/gpt2-xl/resolve/main/merges.txt from cache at /home/r730ub20/.cache/huggingface/transformers/18fe27e0b70062b3e45fc4e827d5449d9fe85875937594da927e48cb657366d1.5d12962c5ee615a4c803841266e9c3be9a691a924f72d395d3a6c6c81157788b [INFO|tokenization_utils_base.py:1717] 2023-05-22 22:00:39,950 >> loading file https://huggingface.co/gpt2-xl/resolve/main/tokenizer.json from cache at /home/r730ub20/.cache/huggingface/transformers/aabb8839163cd911f810ab23f5ae8c966b9b9ea60622c429020611caa389b04b.cf2d0ecb83b6df91b3dbb53f1d1e4c311578bfd3aa0e04934215a49bf9898df0 [INFO|tokenization_utils_base.py:1717] 2023-05-22 22:00:39,950 >> loading file https://huggingface.co/gpt2-xl/resolve/main/added_tokens.json from cache at None [INFO|tokenization_utils_base.py:1717] 2023-05-22 22:00:39,950 >> loading file https://huggingface.co/gpt2-xl/resolve/main/special_tokens_map.json from cache at None [INFO|tokenization_utils_base.py:1717] 2023-05-22 22:00:39,950 >> loading file https://huggingface.co/gpt2-xl/resolve/main/tokenizer_config.json from cache at None [INFO|modeling_utils.py:1152] 2023-05-22 22:00:40,482 >> loading weights file https://huggingface.co/gpt2-xl/resolve/main/pytorch_model.bin from cache at /home/r730ub20/.cache/huggingface/transformers/96569b907e56747ce3e593c6a13d8475b8c733a64aab8af8f602b90d94c4af71.8fbbcdf404c82c5967934d411f1462fa0574d639f2aa398aa3754fced1bb26c0 [INFO|modeling_utils.py:1336] 2023-05-22 22:00:58,095 >> All model checkpoint weights were used when initializing GPT2LMHeadModel.

[INFO|modeling_utils.py:1344] 2023-05-22 22:00:58,095 >> All the weights of GPT2LMHeadModel were initialized from the model checkpoint at gpt2-xl. If your task is similar to the task the model of the checkpoint was trained on, you can already use GPT2LMHeadModel for predictions without further training. 05/22/2023 22:00:58 - WARNING - datasets.fingerprint - Parameter 'function'=<function main..tokenize_function at 0x7f2363e61af0> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed. 0%| | 0/1 [00:00<?, ?ba/s][WARNING|tokenization_utils_base.py:3171] 2023-05-22 22:01:02,910 >> Token indices sequence length is longer than the specified maximum sequence length for this model (1462828 > 1024). Running this sequence through the model will result in indexing errors 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:05<00:00, 5.10s/ba] 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 61.16ba/s] 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 1.46s/ba] 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 194.43ba/s] huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: