PaddlePaddle / PaddleNLP

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
https://paddlenlp.readthedocs.io
Apache License 2.0
11.99k stars 2.92k forks source link

windows cpu UIE-mini微调 还没开始训练就结束了是什么原因 #6834

Open tangtangv opened 1 year ago

tangtangv commented 1 year ago

请提出你的问题

windows cpu UIE-mini微调 还没开始训练就结束了是什么原因呢??

调整了batch_size和max_seq_length 还是一样的

微调命令和输出信息: python finetune.py --device cpu --logging_steps 10 --save_steps 100 --eval_steps 100 --seed 42 --model_name_or_path uie-mini --output_dir ./checkpoint/model_best --train_path ./data/train.txt --dev_path ./data/dev.txt --max_seq_length 128 --per_device_eval_batch_size 2 --per_device_train_batch_size 2 --num_train_epochs 5 --learning_rate 1e-4 --label_names "start_positions""end_positions" --do_train --do_eval --do_export --export_model_dir ./checkpoint/model_best --overwrite_output_dir --disable_tqdm True --metric_for_best_model eval_f1 --load_best_model_at_end True --save_total_limit 1

C:\Users\12426.conda\envs\py39ocr\lib\site-packages_distutils_hack__init__.py:33: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.") [2023-08-28 11:21:01,256] [ INFO] - The default value for the training argument --report_to will change in v5 (from all installed integrations to none). In v5, you will need to use --report_to all to get the same behavior as now. You should start updating your code and make this info disappear :-). [2023-08-28 11:21:01,256] [ INFO] - ============================================================ [2023-08-28 11:21:01,256] [ INFO] - Model Configuration Arguments [2023-08-28 11:21:01,256] [ INFO] - paddle commit id :41ba14f30600373df53839dbf763405cfacb5c92 [2023-08-28 11:21:01,256] [ INFO] - export_model_dir :None [2023-08-28 11:21:01,256] [ INFO] - model_name_or_path :uie-mini [2023-08-28 11:21:01,256] [ INFO] - multilingual :False [2023-08-28 11:21:01,256] [ INFO] - [2023-08-28 11:21:01,272] [ INFO] - ============================================================ [2023-08-28 11:21:01,272] [ INFO] - Data Configuration Arguments [2023-08-28 11:21:01,272] [ INFO] - paddle commit id :41ba14f30600373df53839dbf763405cfacb5c92 [2023-08-28 11:21:01,272] [ INFO] - dev_path :./data/dev.txt [2023-08-28 11:21:01,272] [ INFO] - dynamic_max_length :None [2023-08-28 11:21:01,272] [ INFO] - max_seq_length :128 [2023-08-28 11:21:01,272] [ INFO] - train_path :./data/train.txt [2023-08-28 11:21:01,272] [ INFO] - [2023-08-28 11:21:01,272] [ WARNING] - Process rank: -1, device: cpu, world_size: 1, distributed training: False, 16-bits training: False [2023-08-28 11:21:01,275] [ INFO] - We are using <class 'paddlenlp.transformers.ernie.tokenizer.ErnieTokenizer'> to load 'uie-mini'. [2023-08-28 11:21:01,275] [ INFO] - Already cached C:\Users\12426.paddlenlp\models\uie-mini\ernie_3.0_mini_zh_vocab.txt [2023-08-28 11:21:01,305] [ INFO] - tokenizer config file saved in C:\Users\12426.paddlenlp\models\uie-mini\tokenizer_config.json [2023-08-28 11:21:01,306] [ INFO] - Special tokens file saved in C:\Users\12426.paddlenlp\models\uie-mini\special_tokens_map.json [2023-08-28 11:21:01,308] [ INFO] - Model config ErnieConfig { "attention_probs_dropout_prob": 0.1, "enable_recompute": false, "fuse": false, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 384, "initializer_range": 0.02, "intermediate_size": 1536, "layer_norm_eps": 1e-12, "max_position_embeddings": 2048, "model_type": "ernie", "num_attention_heads": 12, "num_hidden_layers": 6, "pad_token_id": 0, "paddlenlp_version": null, "pool_act": "tanh", "task_id": 0, "task_type_vocab_size": 16, "type_vocab_size": 4, "use_task_id": true, "vocab_size": 40000 }

[2023-08-28 11:21:06,274] [ INFO] - All model checkpoint weights were used when initializing UIE.

[2023-08-28 11:21:06,274] [ INFO] - All the weights of UIE were initialized from the model checkpoint at uie-mini. If your task is similar to the task the model of the checkpoint was trained on, you can already use UIE for predictions without further training. [2023-08-28 11:21:06,370] [ INFO] - ============================================================ [2023-08-28 11:21:06,370] [ INFO] - Training Configuration Arguments
[2023-08-28 11:21:06,370] [ INFO] - paddle commit id :41ba14f30600373df53839dbf763405cfacb5c92 [2023-08-28 11:21:06,371] [ INFO] - _no_sync_in_gradient_accumulation:True [2023-08-28 11:21:06,371] [ INFO] - activation_quantize_type :None [2023-08-28 11:21:06,371] [ INFO] - adam_beta1 :0.9 [2023-08-28 11:21:06,372] [ INFO] - adam_beta2 :0.999 [2023-08-28 11:21:06,372] [ INFO] - adam_epsilon :1e-08 [2023-08-28 11:21:06,372] [ INFO] - algo_list :None [2023-08-28 11:21:06,373] [ INFO] - batch_num_list :None [2023-08-28 11:21:06,373] [ INFO] - batch_size_list :None [2023-08-28 11:21:06,373] [ INFO] - bf16 :False [2023-08-28 11:21:06,373] [ INFO] - bf16_full_eval :False [2023-08-28 11:21:06,374] [ INFO] - bias_correction :False [2023-08-28 11:21:06,374] [ INFO] - current_device :cpu [2023-08-28 11:21:06,374] [ INFO] - dataloader_drop_last :False [2023-08-28 11:21:06,374] [ INFO] - dataloader_num_workers :0 [2023-08-28 11:21:06,375] [ INFO] - device :cpu [2023-08-28 11:21:06,375] [ INFO] - disable_tqdm :False [2023-08-28 11:21:06,375] [ INFO] - do_compress :False [2023-08-28 11:21:06,376] [ INFO] - do_eval :False [2023-08-28 11:21:06,376] [ INFO] - do_export :False [2023-08-28 11:21:06,376] [ INFO] - do_predict :False [2023-08-28 11:21:06,376] [ INFO] - do_train :False [2023-08-28 11:21:06,377] [ INFO] - eval_batch_size :2 [2023-08-28 11:21:06,377] [ INFO] - eval_steps :100 [2023-08-28 11:21:06,377] [ INFO] - evaluation_strategy :IntervalStrategy.NO [2023-08-28 11:21:06,378] [ INFO] - flatten_param_grads :False [2023-08-28 11:21:06,378] [ INFO] - fp16 :False [2023-08-28 11:21:06,379] [ INFO] - fp16_full_eval :False [2023-08-28 11:21:06,379] [ INFO] - fp16_opt_level :O1 [2023-08-28 11:21:06,379] [ INFO] - gradient_accumulation_steps :1 [2023-08-28 11:21:06,380] [ INFO] - greater_is_better :None [2023-08-28 11:21:06,380] [ INFO] - ignore_data_skip :False [2023-08-28 11:21:06,381] [ INFO] - input_dtype :int64 [2023-08-28 11:21:06,381] [ INFO] - input_infer_model_path :None [2023-08-28 11:21:06,381] [ INFO] - label_names :['start_positionsend_positions --do_train --do_eval --do_export --export_model_dir ./checkpoint/model_best --overwrite_output_dir --disable_tqdm True --metric_for_best_model eval_f1 --load_best_model_at_end True --save_total_limit 1'] [2023-08-28 11:21:06,383] [ INFO] - lazy_data_processing :True [2023-08-28 11:21:06,383] [ INFO] - learning_rate :0.0001 [2023-08-28 11:21:06,384] [ INFO] - load_best_model_at_end :False [2023-08-28 11:21:06,384] [ INFO] - local_process_index :0 [2023-08-28 11:21:06,384] [ INFO] - local_rank :-1 [2023-08-28 11:21:06,385] [ INFO] - log_level :-1 [2023-08-28 11:21:06,385] [ INFO] - log_level_replica :-1 [2023-08-28 11:21:06,385] [ INFO] - log_on_each_node :True [2023-08-28 11:21:06,386] [ INFO] - logging_dir :./checkpoint/model_best\runs\Aug28_11-21-01_DESKTOP-VNH1CED [2023-08-28 11:21:06,386] [ INFO] - logging_first_step :False [2023-08-28 11:21:06,386] [ INFO] - logging_steps :10 [2023-08-28 11:21:06,386] [ INFO] - logging_strategy :IntervalStrategy.STEPS [2023-08-28 11:21:06,387] [ INFO] - lr_scheduler_type :SchedulerType.LINEAR [2023-08-28 11:21:06,387] [ INFO] - max_grad_norm :1.0 [2023-08-28 11:21:06,387] [ INFO] - max_steps :-1 [2023-08-28 11:21:06,388] [ INFO] - metric_for_best_model :None [2023-08-28 11:21:06,388] [ INFO] - minimum_eval_times :None [2023-08-28 11:21:06,388] [ INFO] - moving_rate :0.9 [2023-08-28 11:21:06,388] [ INFO] - no_cuda :False [2023-08-28 11:21:06,389] [ INFO] - num_train_epochs :5.0 [2023-08-28 11:21:06,389] [ INFO] - onnx_format :True [2023-08-28 11:21:06,389] [ INFO] - optim :OptimizerNames.ADAMW [2023-08-28 11:21:06,390] [ INFO] - output_dir :./checkpoint/model_best [2023-08-28 11:21:06,390] [ INFO] - overwrite_output_dir :False [2023-08-28 11:21:06,390] [ INFO] - past_index :-1 [2023-08-28 11:21:06,390] [ INFO] - per_device_eval_batch_size :2 [2023-08-28 11:21:06,391] [ INFO] - per_device_train_batch_size :2 [2023-08-28 11:21:06,391] [ INFO] - prediction_loss_only :False [2023-08-28 11:21:06,391] [ INFO] - process_index :0 [2023-08-28 11:21:06,392] [ INFO] - prune_embeddings :False [2023-08-28 11:21:06,392] [ INFO] - recompute :False [2023-08-28 11:21:06,392] [ INFO] - remove_unused_columns :True [2023-08-28 11:21:06,392] [ INFO] - report_to :['visualdl'] [2023-08-28 11:21:06,393] [ INFO] - resume_from_checkpoint :None [2023-08-28 11:21:06,393] [ INFO] - round_type :round [2023-08-28 11:21:06,393] [ INFO] - run_name :./checkpoint/model_best [2023-08-28 11:21:06,393] [ INFO] - save_on_each_node :False [2023-08-28 11:21:06,394] [ INFO] - save_steps :100 [2023-08-28 11:21:06,394] [ INFO] - save_strategy :IntervalStrategy.STEPS [2023-08-28 11:21:06,394] [ INFO] - save_total_limit :None [2023-08-28 11:21:06,395] [ INFO] - scale_loss :32768 [2023-08-28 11:21:06,395] [ INFO] - seed :42 [2023-08-28 11:21:06,397] [ INFO] - sharding :[] [2023-08-28 11:21:06,397] [ INFO] - sharding_degree :-1 [2023-08-28 11:21:06,398] [ INFO] - should_log :True [2023-08-28 11:21:06,398] [ INFO] - should_save :True [2023-08-28 11:21:06,399] [ INFO] - skip_memory_metrics :True [2023-08-28 11:21:06,400] [ INFO] - strategy :dynabert+ptq [2023-08-28 11:21:06,400] [ INFO] - train_batch_size :2 [2023-08-28 11:21:06,401] [ INFO] - use_pact :True [2023-08-28 11:21:06,402] [ INFO] - warmup_ratio :0.1 [2023-08-28 11:21:06,402] [ INFO] - warmup_steps :0 [2023-08-28 11:21:06,402] [ INFO] - weight_decay :0.0 [2023-08-28 11:21:06,403] [ INFO] - weight_quantize_type :channel_wise_abs_max [2023-08-28 11:21:06,403] [ INFO] - width_mult_list :None [2023-08-28 11:21:06,403] [ INFO] - world_size :1 [2023-08-28 11:21:06,404] [ INFO] -

w5688414 commented 4 months ago

请检查一下您的电脑的内存等资源是否充足,数据集等的路径是否正确。