PaddlePaddle / PaddleNLP

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
https://paddlenlp.readthedocs.io
Apache License 2.0
11.98k stars 2.92k forks source link

[Question]: 调用Trainer时传入lr_scheduler_type cosine 后lr不变化 #3800

Closed briup1 closed 1 year ago

briup1 commented 1 year ago

请提出你的问题

在使用UIE的Finetune.py脚本时,传入lr_scheduler_type cosine 后监控lr 并无变化 ,请问是或正常?

ZHUI commented 1 year ago

你好,请升级 paddlenlp 到最新版本试一试看看?pip install paddlenlp==2.4.3

briup1 commented 1 year ago

升级后依然不生效 @ZHUI

ZHUI commented 1 year ago
[2022-11-23 11:11:17,169] [    INFO] - max_steps is given, it will override any value given in num_train_epochs
[2022-11-23 11:11:17,170] [    INFO] - ============================================================
[2022-11-23 11:11:17,171] [    INFO] -     Training Configuration Arguments    
[2022-11-23 11:11:17,171] [    INFO] - paddle commit id              :2ea3700a336ea844389298a7520b386d4ec5fc3b
[2022-11-23 11:11:17,171] [    INFO] - _no_sync_in_gradient_accumulation:True
[2022-11-23 11:11:17,172] [    INFO] - adam_beta1                    :0.9
[2022-11-23 11:11:17,172] [    INFO] - adam_beta2                    :0.999
[2022-11-23 11:11:17,172] [    INFO] - adam_epsilon                  :1e-08
[2022-11-23 11:11:17,172] [    INFO] - bf16                          :False
[2022-11-23 11:11:17,173] [    INFO] - bf16_full_eval                :False
[2022-11-23 11:11:17,173] [    INFO] - current_device                :gpu:4
[2022-11-23 11:11:17,173] [    INFO] - dataloader_drop_last          :False
[2022-11-23 11:11:17,174] [    INFO] - dataloader_num_workers        :0
[2022-11-23 11:11:17,174] [    INFO] - device                        :gpu
[2022-11-23 11:11:17,174] [    INFO] - disable_tqdm                  :True
[2022-11-23 11:11:17,174] [    INFO] - do_eval                       :True
[2022-11-23 11:11:17,175] [    INFO] - do_export                     :True
[2022-11-23 11:11:17,175] [    INFO] - do_predict                    :True
[2022-11-23 11:11:17,175] [    INFO] - do_train                      :True
[2022-11-23 11:11:17,176] [    INFO] - eval_batch_size               :32
[2022-11-23 11:11:17,176] [    INFO] - eval_steps                    :200
[2022-11-23 11:11:17,176] [    INFO] - evaluation_strategy           :IntervalStrategy.STEPS
[2022-11-23 11:11:17,176] [    INFO] - fp16                          :False
[2022-11-23 11:11:17,177] [    INFO] - fp16_full_eval                :False
[2022-11-23 11:11:17,177] [    INFO] - fp16_opt_level                :O2
[2022-11-23 11:11:17,177] [    INFO] - gradient_accumulation_steps   :2
[2022-11-23 11:11:17,177] [    INFO] - greater_is_better             :True
[2022-11-23 11:11:17,178] [    INFO] - ignore_data_skip              :False
[2022-11-23 11:11:17,178] [    INFO] - label_names                   :None
[2022-11-23 11:11:17,178] [    INFO] - learning_rate                 :0.0001
[2022-11-23 11:11:17,179] [    INFO] - load_best_model_at_end        :True
[2022-11-23 11:11:17,179] [    INFO] - local_process_index           :0
[2022-11-23 11:11:17,179] [    INFO] - local_rank                    :0
[2022-11-23 11:11:17,180] [    INFO] - log_level                     :-1
[2022-11-23 11:11:17,180] [    INFO] - log_level_replica             :-1
[2022-11-23 11:11:17,180] [    INFO] - log_on_each_node              :True
[2022-11-23 11:11:17,180] [    INFO] - logging_dir                   :./tmp/xnli_cn/runs/Nov23_11-11-07_yq01-qianmo-com-255-129-12.yq01
[2022-11-23 11:11:17,181] [    INFO] - logging_first_step            :False
[2022-11-23 11:11:17,181] [    INFO] - logging_steps                 :10
[2022-11-23 11:11:17,181] [    INFO] - logging_strategy              :IntervalStrategy.STEPS
[2022-11-23 11:11:17,182] [    INFO] - lr_scheduler_type             :SchedulerType.COSINE
[2022-11-23 11:11:17,182] [    INFO] - max_grad_norm                 :1.0
[2022-11-23 11:11:17,182] [    INFO] - max_steps                     :10000
[2022-11-23 11:11:17,183] [    INFO] - metric_for_best_model         :eval_accuracy
[2022-11-23 11:11:17,183] [    INFO] - minimum_eval_times            :-1
[2022-11-23 11:11:17,183] [    INFO] - no_cuda                       :False
[2022-11-23 11:11:17,184] [    INFO] - num_train_epochs              :3
[2022-11-23 11:11:17,184] [    INFO] - optim                         :OptimizerNames.ADAMW
[2022-11-23 11:11:17,184] [    INFO] - output_dir                    :./tmp/xnli_cn
[2022-11-23 11:11:17,184] [    INFO] - overwrite_output_dir          :False
[2022-11-23 11:11:17,185] [    INFO] - past_index                    :-1
[2022-11-23 11:11:17,185] [    INFO] - per_device_eval_batch_size    :32
[2022-11-23 11:11:17,185] [    INFO] - per_device_train_batch_size   :32
[2022-11-23 11:11:17,185] [    INFO] - prediction_loss_only          :False
[2022-11-23 11:11:17,186] [    INFO] - process_index                 :0
[2022-11-23 11:11:17,186] [    INFO] - recompute                     :True
[2022-11-23 11:11:17,186] [    INFO] - remove_unused_columns         :True
[2022-11-23 11:11:17,187] [    INFO] - report_to                     :['visualdl']
[2022-11-23 11:11:17,187] [    INFO] - resume_from_checkpoint        :None
[2022-11-23 11:11:17,187] [    INFO] - run_name                      :./tmp/xnli_cn
[2022-11-23 11:11:17,187] [    INFO] - save_on_each_node             :False
[2022-11-23 11:11:17,188] [    INFO] - save_steps                    :200
[2022-11-23 11:11:17,188] [    INFO] - save_strategy                 :IntervalStrategy.STEPS
[2022-11-23 11:11:17,188] [    INFO] - save_total_limit              :3
[2022-11-23 11:11:17,189] [    INFO] - scale_loss                    :32768
[2022-11-23 11:11:17,189] [    INFO] - seed                          :42
[2022-11-23 11:11:17,189] [    INFO] - sharding                      :[]
[2022-11-23 11:11:17,189] [    INFO] - sharding_degree               :2
[2022-11-23 11:11:17,190] [    INFO] - should_log                    :True
[2022-11-23 11:11:17,190] [    INFO] - should_save                   :True
[2022-11-23 11:11:17,190] [    INFO] - train_batch_size              :32
[2022-11-23 11:11:17,190] [    INFO] - warmup_ratio                  :0.0
[2022-11-23 11:11:17,191] [    INFO] - warmup_steps                  :0
[2022-11-23 11:11:17,191] [    INFO] - weight_decay                  :0.01
[2022-11-23 11:11:17,191] [    INFO] - world_size                    :4
[2022-11-23 11:11:17,191] [    INFO] - 
[2022-11-23 11:11:17,254] [    INFO] - ***** Running training *****
[2022-11-23 11:11:17,254] [    INFO] -   Num examples = 392701
[2022-11-23 11:11:17,254] [    INFO] -   Num Epochs = 7
[2022-11-23 11:11:17,255] [    INFO] -   Instantaneous batch size per device = 32
[2022-11-23 11:11:17,255] [    INFO] -   Total train batch size (w. parallel, distributed & accumulation) = 256
[2022-11-23 11:11:17,255] [    INFO] -   Gradient Accumulation steps = 2
[2022-11-23 11:11:17,255] [    INFO] -   Total optimization steps = 10000
[2022-11-23 11:11:17,255] [    INFO] -   Total num train samples = 2560000
[2022-11-23 11:11:18,787] [    INFO] -   Number of trainable parameters = 99868419
[2022-11-23 11:11:27,064] [    INFO] - loss: 1.0899848, learning_rate: 9.999975326009292e-05, global_step: 10, interval_runtime: 8.2708, interval_samples_per_second: 309.523, interval_steps_per_second: 1.209, epoch: 0.0065
[2022-11-23 11:11:32,768] [    INFO] - loss: 0.94208021, learning_rate: 9.999901304280685e-05, global_step: 20, interval_runtime: 5.7045, interval_samples_per_second: 448.768, interval_steps_per_second: 1.753, epoch: 0.013
[2022-11-23 11:11:38,476] [    INFO] - loss: 0.86334038, learning_rate: 9.99977793554475e-05, global_step: 30, interval_runtime: 5.7078, interval_samples_per_second: 448.509, interval_steps_per_second: 1.752, epoch: 0.0196
[2022-11-23 11:11:44,074] [    INFO] - loss: 0.80716286, learning_rate: 9.999605221019081e-05, global_step: 40, interval_runtime: 5.5982, interval_samples_per_second: 457.29, interval_steps_per_second: 1.786, epoch: 0.0261
[2022-11-23 11:11:49,725] [    INFO] - loss: 0.7661695, learning_rate: 9.999383162408304e-05, global_step: 50, interval_runtime: 5.6511, interval_samples_per_second: 453.009, interval_steps_per_second: 1.77, epoch: 0.0326
[2022-11-23 11:11:55,454] [    INFO] - loss: 0.73705273, learning_rate: 9.999111761904046e-05, global_step: 60, interval_runtime: 5.7298, interval_samples_per_second: 446.789, interval_steps_per_second: 1.745, epoch: 0.0391
[2022-11-23 11:12:01,294] [    INFO] - loss: 0.7377521, learning_rate: 9.998791022184922e-05, global_step: 70, interval_runtime: 5.8394, interval_samples_per_second: 438.401, interval_steps_per_second: 1.713, epoch: 0.0456
[2022-11-23 11:12:06,893] [    INFO] - loss: 0.71224113, learning_rate: 9.9984209464165e-05, global_step: 80, interval_runtime: 5.5986, interval_samples_per_second: 457.256, interval_steps_per_second: 1.786, epoch: 0.0522
ZHUI commented 1 year ago

@briup1 上面是我的运行日志,可以看到可以生效,你贴一下你的日志吧。

briup1 commented 1 year ago
[2022-11-23 01:26:08,092] [    INFO] - We are using <class 'paddlenlp.transformers.ernie.tokenizer.ErnieTokenizer'> to load 'uie-base-en'.
[2022-11-23 01:26:15,034] [    INFO] - ============================================================
[2022-11-23 01:26:15,034] [    INFO] -     Training Configuration Arguments    
[2022-11-23 01:26:15,034] [    INFO] - paddle commit id              :590b4dbcdd989324089ce43c22ef151c746c92a3
[2022-11-23 01:26:15,035] [    INFO] - _no_sync_in_gradient_accumulation:True
[2022-11-23 01:26:15,035] [    INFO] - activation_quantize_type      :None
[2022-11-23 01:26:15,035] [    INFO] - adam_beta1                    :0.9
[2022-11-23 01:26:15,036] [    INFO] - adam_beta2                    :0.999
[2022-11-23 01:26:15,036] [    INFO] - adam_epsilon                  :1e-08
[2022-11-23 01:26:15,036] [    INFO] - algo_list                     :None
[2022-11-23 01:26:15,036] [    INFO] - batch_num_list                :None
[2022-11-23 01:26:15,037] [    INFO] - batch_size_list               :None
[2022-11-23 01:26:15,037] [    INFO] - bf16                          :False
[2022-11-23 01:26:15,037] [    INFO] - bf16_full_eval                :False
[2022-11-23 01:26:15,038] [    INFO] - bias_correction               :False
[2022-11-23 01:26:15,039] [    INFO] - current_device                :gpu:0
[2022-11-23 01:26:15,039] [    INFO] - dataloader_drop_last          :False
[2022-11-23 01:26:15,040] [    INFO] - dataloader_num_workers        :0
[2022-11-23 01:26:15,040] [    INFO] - device                        :gpu
[2022-11-23 01:26:15,041] [    INFO] - disable_tqdm                  :True
[2022-11-23 01:26:15,041] [    INFO] - do_compress                   :False
[2022-11-23 01:26:15,042] [    INFO] - do_eval                       :True
[2022-11-23 01:26:15,042] [    INFO] - do_export                     :True
[2022-11-23 01:26:15,042] [    INFO] - do_predict                    :False
[2022-11-23 01:26:15,043] [    INFO] - do_train                      :True
[2022-11-23 01:26:15,043] [    INFO] - eval_batch_size               :32
[2022-11-23 01:26:15,044] [    INFO] - eval_steps                    :100
[2022-11-23 01:26:15,044] [    INFO] - evaluation_strategy           :IntervalStrategy.STEPS
[2022-11-23 01:26:15,044] [    INFO] - fp16                          :False
[2022-11-23 01:26:15,045] [    INFO] - fp16_full_eval                :False
[2022-11-23 01:26:15,045] [    INFO] - fp16_opt_level                :O1
[2022-11-23 01:26:15,046] [    INFO] - gradient_accumulation_steps   :1
[2022-11-23 01:26:15,046] [    INFO] - greater_is_better             :True
[2022-11-23 01:26:15,047] [    INFO] - ignore_data_skip              :True
[2022-11-23 01:26:15,047] [    INFO] - input_infer_model_path        :None
[2022-11-23 01:26:15,047] [    INFO] - label_names                   :['start_positions', 'end_positions']
[2022-11-23 01:26:15,047] [    INFO] - learning_rate                 :2e-05
[2022-11-23 01:26:15,048] [    INFO] - load_best_model_at_end        :True
[2022-11-23 01:26:15,048] [    INFO] - local_process_index           :0
[2022-11-23 01:26:15,049] [    INFO] - local_rank                    :0
[2022-11-23 01:26:15,049] [    INFO] - log_level                     :-1
[2022-11-23 01:26:15,049] [    INFO] - log_level_replica             :-1
[2022-11-23 01:26:15,050] [    INFO] - log_on_each_node              :True
[2022-11-23 01:26:15,050] [    INFO] - logging_dir                   :./finetuned_model_qq/runs/Nov23_01-24-57_6bf6d99aotq62-0
[2022-11-23 01:26:15,050] [    INFO] - logging_first_step            :False
[2022-11-23 01:26:15,051] [    INFO] - logging_steps                 :20
[2022-11-23 01:26:15,051] [    INFO] - logging_strategy              :IntervalStrategy.STEPS
[2022-11-23 01:26:15,052] [    INFO] - lr_scheduler_type             :SchedulerType.CONSTANT
[2022-11-23 01:26:15,052] [    INFO] - max_grad_norm                 :1.0
[2022-11-23 01:26:15,053] [    INFO] - max_steps                     :-1
[2022-11-23 01:26:15,053] [    INFO] - metric_for_best_model         :eval_f1
[2022-11-23 01:26:15,054] [    INFO] - minimum_eval_times            :None
[2022-11-23 01:26:15,054] [    INFO] - moving_rate                   :0.9
[2022-11-23 01:26:15,054] [    INFO] - no_cuda                       :False
[2022-11-23 01:26:15,055] [    INFO] - num_train_epochs              :50.0
[2022-11-23 01:26:15,055] [    INFO] - onnx_format                   :True
[2022-11-23 01:26:15,055] [    INFO] - optim                         :OptimizerNames.ADAMW
[2022-11-23 01:26:15,056] [    INFO] - output_dir                    :./finetuned_model_qq
[2022-11-23 01:26:15,056] [    INFO] - overwrite_output_dir          :True
[2022-11-23 01:26:15,057] [    INFO] - past_index                    :-1
[2022-11-23 01:26:15,057] [    INFO] - per_device_eval_batch_size    :32
[2022-11-23 01:26:15,057] [    INFO] - per_device_train_batch_size   :32
[2022-11-23 01:26:15,058] [    INFO] - prediction_loss_only          :False
[2022-11-23 01:26:15,058] [    INFO] - process_index                 :0
[2022-11-23 01:26:15,058] [    INFO] - recompute                     :False
[2022-11-23 01:26:15,059] [    INFO] - remove_unused_columns         :True
[2022-11-23 01:26:15,059] [    INFO] - report_to                     :['visualdl']
[2022-11-23 01:26:15,059] [    INFO] - resume_from_checkpoint        :None
[2022-11-23 01:26:15,060] [    INFO] - round_type                    :round
[2022-11-23 01:26:15,060] [    INFO] - run_name                      :./finetuned_model_qq
[2022-11-23 01:26:15,061] [    INFO] - save_on_each_node             :False
[2022-11-23 01:26:15,061] [    INFO] - save_steps                    :100
[2022-11-23 01:26:15,061] [    INFO] - save_strategy                 :IntervalStrategy.STEPS
[2022-11-23 01:26:15,062] [    INFO] - save_total_limit              :1
[2022-11-23 01:26:15,062] [    INFO] - scale_loss                    :32768
[2022-11-23 01:26:15,062] [    INFO] - seed                          :42
[2022-11-23 01:26:15,062] [    INFO] - sharding                      :[]
[2022-11-23 01:26:15,063] [    INFO] - sharding_degree               :-1
[2022-11-23 01:26:15,063] [    INFO] - should_log                    :True
[2022-11-23 01:26:15,064] [    INFO] - should_save                   :True
[2022-11-23 01:26:15,064] [    INFO] - strategy                      :dynabert+ptq
[2022-11-23 01:26:15,064] [    INFO] - train_batch_size              :32
[2022-11-23 01:26:15,065] [    INFO] - use_pact                      :True
[2022-11-23 01:26:15,065] [    INFO] - warmup_ratio                  :0.1
[2022-11-23 01:26:15,066] [    INFO] - warmup_steps                  :0
[2022-11-23 01:26:15,066] [    INFO] - weight_decay                  :0.0
[2022-11-23 01:26:15,066] [    INFO] - weight_quantize_type          :channel_wise_abs_max
[2022-11-23 01:26:15,067] [    INFO] - width_mult_list               :None
[2022-11-23 01:26:15,067] [    INFO] - world_size                    :4
[2022-11-23 01:26:15,067] [    INFO] - 
[2022-11-23 01:26:15,155] [    INFO] - ***** Running training *****
[2022-11-23 01:26:15,155] [    INFO] -   Num examples = 10631
[2022-11-23 01:26:15,156] [    INFO] -   Num Epochs = 50
[2022-11-23 01:26:15,156] [    INFO] -   Instantaneous batch size per device = 32
[2022-11-23 01:26:15,156] [    INFO] -   Total train batch size (w. parallel, distributed & accumulation) = 128
[2022-11-23 01:26:15,157] [    INFO] -   Gradient Accumulation steps = 1
[2022-11-23 01:26:15,157] [    INFO] -   Total optimization steps = 4200.0
[2022-11-23 01:26:15,158] [    INFO] -   Total num train samples = 531550.0
[2022-11-23 01:26:15,429] [    INFO] -   Number of trainable parameters = 109485314
[2022-11-23 01:26:37,530] [    INFO] - loss: 0.00170259, learning_rate: 2e-05, global_step: 20, interval_runtime: 22.0947, interval_samples_per_second: 115.865, interval_steps_per_second: 0.905, epoch: 0.2381
[2022-11-23 01:26:58,366] [    INFO] - loss: 0.00105053, learning_rate: 2e-05, global_step: 40, interval_runtime: 20.8368, interval_samples_per_second: 122.86, interval_steps_per_second: 0.96, epoch: 0.4762
[2022-11-23 01:27:13,205] [    INFO] - loss: 0.00058154, learning_rate: 2e-05, global_step: 60, interval_runtime: 14.8386, interval_samples_per_second: 172.522, interval_steps_per_second: 1.348, epoch: 0.7143
[2022-11-23 01:27:34,129] [    INFO] - loss: 0.00063376, learning_rate: 2e-05, global_step: 80, interval_runtime: 20.924, interval_samples_per_second: 122.347, interval_steps_per_second: 0.956, epoch: 0.9524
[2022-11-23 01:27:54,536] [    INFO] - loss: 0.00043808, learning_rate: 2e-05, global_step: 100, interval_runtime: 20.4046, interval_samples_per_second: 125.462, interval_steps_per_second: 0.98, epoch: 1.1905

@ZHUI 在这里我设置的是constant,但用cosine也是一样的。

Name: paddlenlp
Version: 2.4.3
Summary: Easy-to-use and powerful NLP library with Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including Neural Search, Question Answering, Information Extraction and Sentiment Analysis end-to-end system.
Home-page: https://github.com/PaddlePaddle/PaddleNLP
Author: PaddleNLP Team
Author-email: paddlenlp@baidu.com
License: Apache 2.0
Location: /zhangxuejie373/anaconda3/envs/p38_paddle/lib/python3.8/site-packages
Requires: colorama, colorlog, datasets, dill, jieba, multiprocess, paddle2onnx, paddlefsl, protobuf, sentencepiece, seqeval, tqdm, visualdl
Required-by: 
ZHUI commented 1 year ago

你日志里面的,是constant类型的lr_scheduler_type。请检查一下你的启动命令,参数的传入?

lr_scheduler_type             :SchedulerType.CONSTANT

constant,学习率 不变化是符合预期的行为

briup1 commented 1 year ago
[2022-11-23 05:35:27,615] [    INFO] - ============================================================
[2022-11-23 05:35:27,615] [    INFO] -     Training Configuration Arguments    
[2022-11-23 05:35:27,615] [    INFO] - paddle commit id              :590b4dbcdd989324089ce43c22ef151c746c92a3
[2022-11-23 05:35:27,616] [    INFO] - _no_sync_in_gradient_accumulation:True
[2022-11-23 05:35:27,616] [    INFO] - activation_quantize_type      :None
[2022-11-23 05:35:27,617] [    INFO] - adam_beta1                    :0.9
[2022-11-23 05:35:27,617] [    INFO] - adam_beta2                    :0.999
[2022-11-23 05:35:27,617] [    INFO] - adam_epsilon                  :1e-08
[2022-11-23 05:35:27,618] [    INFO] - algo_list                     :None
[2022-11-23 05:35:27,618] [    INFO] - batch_num_list                :None
[2022-11-23 05:35:27,618] [    INFO] - batch_size_list               :None
[2022-11-23 05:35:27,619] [    INFO] - bf16                          :False
[2022-11-23 05:35:27,619] [    INFO] - bf16_full_eval                :False
[2022-11-23 05:35:27,619] [    INFO] - bias_correction               :False
[2022-11-23 05:35:27,620] [    INFO] - current_device                :gpu:3
[2022-11-23 05:35:27,620] [    INFO] - dataloader_drop_last          :False
[2022-11-23 05:35:27,620] [    INFO] - dataloader_num_workers        :0
[2022-11-23 05:35:27,621] [    INFO] - device                        :gpu
[2022-11-23 05:35:27,621] [    INFO] - disable_tqdm                  :True
[2022-11-23 05:35:27,621] [    INFO] - do_compress                   :False
[2022-11-23 05:35:27,622] [    INFO] - do_eval                       :True
[2022-11-23 05:35:27,622] [    INFO] - do_export                     :True
[2022-11-23 05:35:27,622] [    INFO] - do_predict                    :False
[2022-11-23 05:35:27,623] [    INFO] - do_train                      :True
[2022-11-23 05:35:27,623] [    INFO] - eval_batch_size               :32
[2022-11-23 05:35:27,623] [    INFO] - eval_steps                    :100
[2022-11-23 05:35:27,624] [    INFO] - evaluation_strategy           :IntervalStrategy.STEPS
[2022-11-23 05:35:27,624] [    INFO] - fp16                          :False
[2022-11-23 05:35:27,624] [    INFO] - fp16_full_eval                :False
[2022-11-23 05:35:27,625] [    INFO] - fp16_opt_level                :O1
[2022-11-23 05:35:27,625] [    INFO] - gradient_accumulation_steps   :1
[2022-11-23 05:35:27,625] [    INFO] - greater_is_better             :True
[2022-11-23 05:35:27,626] [    INFO] - ignore_data_skip              :True
[2022-11-23 05:35:27,626] [    INFO] - input_infer_model_path        :None
[2022-11-23 05:35:27,626] [    INFO] - label_names                   :['start_positions', 'end_positions']
[2022-11-23 05:35:27,627] [    INFO] - learning_rate                 :2e-05
[2022-11-23 05:35:27,627] [    INFO] - load_best_model_at_end        :True
[2022-11-23 05:35:27,627] [    INFO] - local_process_index           :0
[2022-11-23 05:35:27,628] [    INFO] - local_rank                    :-1
[2022-11-23 05:35:27,628] [    INFO] - log_level                     :-1
[2022-11-23 05:35:27,628] [    INFO] - log_level_replica             :-1
[2022-11-23 05:35:27,629] [    INFO] - log_on_each_node              :True
[2022-11-23 05:35:27,629] [    INFO] - logging_dir                   :./finetuned_model_qq/runs/Nov23_05-35-18_6bf6d99aotq62-0
[2022-11-23 05:35:27,630] [    INFO] - logging_first_step            :False
[2022-11-23 05:35:27,630] [    INFO] - logging_steps                 :10
[2022-11-23 05:35:27,630] [    INFO] - logging_strategy              :IntervalStrategy.STEPS
[2022-11-23 05:35:27,631] [    INFO] - lr_scheduler_type             :SchedulerType.COSINE
[2022-11-23 05:35:27,631] [    INFO] - max_grad_norm                 :1.0
[2022-11-23 05:35:27,631] [    INFO] - max_steps                     :-1
[2022-11-23 05:35:27,631] [    INFO] - metric_for_best_model         :eval_f1
[2022-11-23 05:35:27,632] [    INFO] - minimum_eval_times            :None
[2022-11-23 05:35:27,632] [    INFO] - moving_rate                   :0.9
[2022-11-23 05:35:27,633] [    INFO] - no_cuda                       :False
[2022-11-23 05:35:27,633] [    INFO] - num_train_epochs              :5.0
[2022-11-23 05:35:27,633] [    INFO] - onnx_format                   :True
[2022-11-23 05:35:27,634] [    INFO] - optim                         :OptimizerNames.ADAMW
[2022-11-23 05:35:27,634] [    INFO] - output_dir                    :./finetuned_model_qq
[2022-11-23 05:35:27,634] [    INFO] - overwrite_output_dir          :True
[2022-11-23 05:35:27,635] [    INFO] - past_index                    :-1
[2022-11-23 05:35:27,635] [    INFO] - per_device_eval_batch_size    :32
[2022-11-23 05:35:27,635] [    INFO] - per_device_train_batch_size   :32
[2022-11-23 05:35:27,635] [    INFO] - prediction_loss_only          :False
[2022-11-23 05:35:27,636] [    INFO] - process_index                 :0
[2022-11-23 05:35:27,636] [    INFO] - recompute                     :False
[2022-11-23 05:35:27,636] [    INFO] - remove_unused_columns         :True
[2022-11-23 05:35:27,637] [    INFO] - report_to                     :['visualdl']
[2022-11-23 05:35:27,637] [    INFO] - resume_from_checkpoint        :None
[2022-11-23 05:35:27,637] [    INFO] - round_type                    :round
[2022-11-23 05:35:27,638] [    INFO] - run_name                      :./finetuned_model_qq
[2022-11-23 05:35:27,638] [    INFO] - save_on_each_node             :False
[2022-11-23 05:35:27,638] [    INFO] - save_steps                    :100
[2022-11-23 05:35:27,639] [    INFO] - save_strategy                 :IntervalStrategy.STEPS
[2022-11-23 05:35:27,639] [    INFO] - save_total_limit              :1
[2022-11-23 05:35:27,639] [    INFO] - scale_loss                    :32768
[2022-11-23 05:35:27,640] [    INFO] - seed                          :42
[2022-11-23 05:35:27,640] [    INFO] - sharding                      :[]
[2022-11-23 05:35:27,640] [    INFO] - sharding_degree               :-1
[2022-11-23 05:35:27,641] [    INFO] - should_log                    :True
[2022-11-23 05:35:27,641] [    INFO] - should_save                   :True
[2022-11-23 05:35:27,641] [    INFO] - strategy                      :dynabert+ptq
[2022-11-23 05:35:27,642] [    INFO] - train_batch_size              :32
[2022-11-23 05:35:27,642] [    INFO] - use_pact                      :True
[2022-11-23 05:35:27,642] [    INFO] - warmup_ratio                  :0.1
[2022-11-23 05:35:27,643] [    INFO] - warmup_steps                  :0
[2022-11-23 05:35:27,643] [    INFO] - weight_decay                  :0.0
[2022-11-23 05:35:27,643] [    INFO] - weight_quantize_type          :channel_wise_abs_max
[2022-11-23 05:35:27,644] [    INFO] - width_mult_list               :None
[2022-11-23 05:35:27,644] [    INFO] - world_size                    :1
[2022-11-23 05:35:27,644] [    INFO] - 
[2022-11-23 05:35:27,646] [    INFO] - ***** Running training *****
[2022-11-23 05:35:27,646] [    INFO] -   Num examples = 10631
[2022-11-23 05:35:27,647] [    INFO] -   Num Epochs = 5
[2022-11-23 05:35:27,647] [    INFO] -   Instantaneous batch size per device = 32
[2022-11-23 05:35:27,647] [    INFO] -   Total train batch size (w. parallel, distributed & accumulation) = 32
[2022-11-23 05:35:27,648] [    INFO] -   Gradient Accumulation steps = 1
[2022-11-23 05:35:27,648] [    INFO] -   Total optimization steps = 1665.0
[2022-11-23 05:35:27,648] [    INFO] -   Total num train samples = 53155.0
[2022-11-23 05:35:27,846] [    INFO] -   Number of trainable parameters = 109485314
[2022-11-23 05:35:38,718] [    INFO] - loss: 0.00220406, learning_rate: 2e-05, global_step: 10, interval_runtime: 10.8645, interval_samples_per_second: 29.454, interval_steps_per_second: 0.92, epoch: 0.03
[2022-11-23 05:35:48,959] [    INFO] - loss: 0.00207934, learning_rate: 2e-05, global_step: 20, interval_runtime: 10.2417, interval_samples_per_second: 31.245, interval_steps_per_second: 0.976, epoch: 0.0601
[2022-11-23 05:35:59,216] [    INFO] - loss: 0.00126031, learning_rate: 2e-05, global_step: 30, interval_runtime: 10.2564, interval_samples_per_second: 31.2, interval_steps_per_second: 0.975, epoch: 0.0901
[2022-11-23 05:36:03,783] [    INFO] - loss: 0.0013646, learning_rate: 2e-05, global_step: 40, interval_runtime: 4.5671, interval_samples_per_second: 70.066, interval_steps_per_second: 2.19, epoch: 0.1201
[2022-11-23 05:36:14,029] [    INFO] - loss: 0.00103009, learning_rate: 2e-05, global_step: 50, interval_runtime: 10.2461, interval_samples_per_second: 31.231, interval_steps_per_second: 0.976, epoch: 0.1502
[2022-11-23 05:36:24,288] [    INFO] - loss: 0.00078573, learning_rate: 2e-05, global_step: 60, interval_runtime: 10.2587, interval_samples_per_second: 31.193, interval_steps_per_second: 0.975, epoch: 0.1802
[2022-11-23 05:36:34,552] [    INFO] - loss: 0.00088691, learning_rate: 2e-05, global_step: 70, interval_runtime: 10.2639, interval_samples_per_second: 31.177, interval_steps_per_second: 0.974, epoch: 0.2102

lr_scheduler_type :SchedulerType.COSINE 当前COSINE 依旧保持了learning_rate不变 T-T @ZHUI

ZHUI commented 1 year ago

你这个是模型压缩的api吗?我看用了 dynabert+ptq

ZHUI commented 1 year ago

@LiuChiachi 辛苦看看

ZHUI commented 1 year ago

可以贴一下您使用的代码链接给我们排查

ZHUI commented 1 year ago

如果是 UIE finetune的代码的话,可以删除这一行:

https://github.com/PaddlePaddle/PaddleNLP/blob/fc7cdc025f70f5236420a56487dd3a983665b4fd/model_zoo/uie/finetune.py#L198-L200

briup1 commented 1 year ago

可以了。感谢!!! 这行代码后续也会删除吗? 之所以能够直接删除是因为Trainer默认了一个优化器的吗?

ZHUI commented 1 year ago

对的,后续会删除的。Trainer 需要自己控制 optimizer,不然很多功能无法生效。

@LiuChiachi 辛苦后续删除一下。