Closed briup1 closed 1 year ago
你好,请升级 paddlenlp 到最新版本试一试看看?pip install paddlenlp==2.4.3
升级后依然不生效 @ZHUI
[2022-11-23 11:11:17,169] [ INFO] - max_steps is given, it will override any value given in num_train_epochs
[2022-11-23 11:11:17,170] [ INFO] - ============================================================
[2022-11-23 11:11:17,171] [ INFO] - Training Configuration Arguments
[2022-11-23 11:11:17,171] [ INFO] - paddle commit id :2ea3700a336ea844389298a7520b386d4ec5fc3b
[2022-11-23 11:11:17,171] [ INFO] - _no_sync_in_gradient_accumulation:True
[2022-11-23 11:11:17,172] [ INFO] - adam_beta1 :0.9
[2022-11-23 11:11:17,172] [ INFO] - adam_beta2 :0.999
[2022-11-23 11:11:17,172] [ INFO] - adam_epsilon :1e-08
[2022-11-23 11:11:17,172] [ INFO] - bf16 :False
[2022-11-23 11:11:17,173] [ INFO] - bf16_full_eval :False
[2022-11-23 11:11:17,173] [ INFO] - current_device :gpu:4
[2022-11-23 11:11:17,173] [ INFO] - dataloader_drop_last :False
[2022-11-23 11:11:17,174] [ INFO] - dataloader_num_workers :0
[2022-11-23 11:11:17,174] [ INFO] - device :gpu
[2022-11-23 11:11:17,174] [ INFO] - disable_tqdm :True
[2022-11-23 11:11:17,174] [ INFO] - do_eval :True
[2022-11-23 11:11:17,175] [ INFO] - do_export :True
[2022-11-23 11:11:17,175] [ INFO] - do_predict :True
[2022-11-23 11:11:17,175] [ INFO] - do_train :True
[2022-11-23 11:11:17,176] [ INFO] - eval_batch_size :32
[2022-11-23 11:11:17,176] [ INFO] - eval_steps :200
[2022-11-23 11:11:17,176] [ INFO] - evaluation_strategy :IntervalStrategy.STEPS
[2022-11-23 11:11:17,176] [ INFO] - fp16 :False
[2022-11-23 11:11:17,177] [ INFO] - fp16_full_eval :False
[2022-11-23 11:11:17,177] [ INFO] - fp16_opt_level :O2
[2022-11-23 11:11:17,177] [ INFO] - gradient_accumulation_steps :2
[2022-11-23 11:11:17,177] [ INFO] - greater_is_better :True
[2022-11-23 11:11:17,178] [ INFO] - ignore_data_skip :False
[2022-11-23 11:11:17,178] [ INFO] - label_names :None
[2022-11-23 11:11:17,178] [ INFO] - learning_rate :0.0001
[2022-11-23 11:11:17,179] [ INFO] - load_best_model_at_end :True
[2022-11-23 11:11:17,179] [ INFO] - local_process_index :0
[2022-11-23 11:11:17,179] [ INFO] - local_rank :0
[2022-11-23 11:11:17,180] [ INFO] - log_level :-1
[2022-11-23 11:11:17,180] [ INFO] - log_level_replica :-1
[2022-11-23 11:11:17,180] [ INFO] - log_on_each_node :True
[2022-11-23 11:11:17,180] [ INFO] - logging_dir :./tmp/xnli_cn/runs/Nov23_11-11-07_yq01-qianmo-com-255-129-12.yq01
[2022-11-23 11:11:17,181] [ INFO] - logging_first_step :False
[2022-11-23 11:11:17,181] [ INFO] - logging_steps :10
[2022-11-23 11:11:17,181] [ INFO] - logging_strategy :IntervalStrategy.STEPS
[2022-11-23 11:11:17,182] [ INFO] - lr_scheduler_type :SchedulerType.COSINE
[2022-11-23 11:11:17,182] [ INFO] - max_grad_norm :1.0
[2022-11-23 11:11:17,182] [ INFO] - max_steps :10000
[2022-11-23 11:11:17,183] [ INFO] - metric_for_best_model :eval_accuracy
[2022-11-23 11:11:17,183] [ INFO] - minimum_eval_times :-1
[2022-11-23 11:11:17,183] [ INFO] - no_cuda :False
[2022-11-23 11:11:17,184] [ INFO] - num_train_epochs :3
[2022-11-23 11:11:17,184] [ INFO] - optim :OptimizerNames.ADAMW
[2022-11-23 11:11:17,184] [ INFO] - output_dir :./tmp/xnli_cn
[2022-11-23 11:11:17,184] [ INFO] - overwrite_output_dir :False
[2022-11-23 11:11:17,185] [ INFO] - past_index :-1
[2022-11-23 11:11:17,185] [ INFO] - per_device_eval_batch_size :32
[2022-11-23 11:11:17,185] [ INFO] - per_device_train_batch_size :32
[2022-11-23 11:11:17,185] [ INFO] - prediction_loss_only :False
[2022-11-23 11:11:17,186] [ INFO] - process_index :0
[2022-11-23 11:11:17,186] [ INFO] - recompute :True
[2022-11-23 11:11:17,186] [ INFO] - remove_unused_columns :True
[2022-11-23 11:11:17,187] [ INFO] - report_to :['visualdl']
[2022-11-23 11:11:17,187] [ INFO] - resume_from_checkpoint :None
[2022-11-23 11:11:17,187] [ INFO] - run_name :./tmp/xnli_cn
[2022-11-23 11:11:17,187] [ INFO] - save_on_each_node :False
[2022-11-23 11:11:17,188] [ INFO] - save_steps :200
[2022-11-23 11:11:17,188] [ INFO] - save_strategy :IntervalStrategy.STEPS
[2022-11-23 11:11:17,188] [ INFO] - save_total_limit :3
[2022-11-23 11:11:17,189] [ INFO] - scale_loss :32768
[2022-11-23 11:11:17,189] [ INFO] - seed :42
[2022-11-23 11:11:17,189] [ INFO] - sharding :[]
[2022-11-23 11:11:17,189] [ INFO] - sharding_degree :2
[2022-11-23 11:11:17,190] [ INFO] - should_log :True
[2022-11-23 11:11:17,190] [ INFO] - should_save :True
[2022-11-23 11:11:17,190] [ INFO] - train_batch_size :32
[2022-11-23 11:11:17,190] [ INFO] - warmup_ratio :0.0
[2022-11-23 11:11:17,191] [ INFO] - warmup_steps :0
[2022-11-23 11:11:17,191] [ INFO] - weight_decay :0.01
[2022-11-23 11:11:17,191] [ INFO] - world_size :4
[2022-11-23 11:11:17,191] [ INFO] -
[2022-11-23 11:11:17,254] [ INFO] - ***** Running training *****
[2022-11-23 11:11:17,254] [ INFO] - Num examples = 392701
[2022-11-23 11:11:17,254] [ INFO] - Num Epochs = 7
[2022-11-23 11:11:17,255] [ INFO] - Instantaneous batch size per device = 32
[2022-11-23 11:11:17,255] [ INFO] - Total train batch size (w. parallel, distributed & accumulation) = 256
[2022-11-23 11:11:17,255] [ INFO] - Gradient Accumulation steps = 2
[2022-11-23 11:11:17,255] [ INFO] - Total optimization steps = 10000
[2022-11-23 11:11:17,255] [ INFO] - Total num train samples = 2560000
[2022-11-23 11:11:18,787] [ INFO] - Number of trainable parameters = 99868419
[2022-11-23 11:11:27,064] [ INFO] - loss: 1.0899848, learning_rate: 9.999975326009292e-05, global_step: 10, interval_runtime: 8.2708, interval_samples_per_second: 309.523, interval_steps_per_second: 1.209, epoch: 0.0065
[2022-11-23 11:11:32,768] [ INFO] - loss: 0.94208021, learning_rate: 9.999901304280685e-05, global_step: 20, interval_runtime: 5.7045, interval_samples_per_second: 448.768, interval_steps_per_second: 1.753, epoch: 0.013
[2022-11-23 11:11:38,476] [ INFO] - loss: 0.86334038, learning_rate: 9.99977793554475e-05, global_step: 30, interval_runtime: 5.7078, interval_samples_per_second: 448.509, interval_steps_per_second: 1.752, epoch: 0.0196
[2022-11-23 11:11:44,074] [ INFO] - loss: 0.80716286, learning_rate: 9.999605221019081e-05, global_step: 40, interval_runtime: 5.5982, interval_samples_per_second: 457.29, interval_steps_per_second: 1.786, epoch: 0.0261
[2022-11-23 11:11:49,725] [ INFO] - loss: 0.7661695, learning_rate: 9.999383162408304e-05, global_step: 50, interval_runtime: 5.6511, interval_samples_per_second: 453.009, interval_steps_per_second: 1.77, epoch: 0.0326
[2022-11-23 11:11:55,454] [ INFO] - loss: 0.73705273, learning_rate: 9.999111761904046e-05, global_step: 60, interval_runtime: 5.7298, interval_samples_per_second: 446.789, interval_steps_per_second: 1.745, epoch: 0.0391
[2022-11-23 11:12:01,294] [ INFO] - loss: 0.7377521, learning_rate: 9.998791022184922e-05, global_step: 70, interval_runtime: 5.8394, interval_samples_per_second: 438.401, interval_steps_per_second: 1.713, epoch: 0.0456
[2022-11-23 11:12:06,893] [ INFO] - loss: 0.71224113, learning_rate: 9.9984209464165e-05, global_step: 80, interval_runtime: 5.5986, interval_samples_per_second: 457.256, interval_steps_per_second: 1.786, epoch: 0.0522
@briup1 上面是我的运行日志,可以看到可以生效,你贴一下你的日志吧。
[2022-11-23 01:26:08,092] [ INFO] - We are using <class 'paddlenlp.transformers.ernie.tokenizer.ErnieTokenizer'> to load 'uie-base-en'.
[2022-11-23 01:26:15,034] [ INFO] - ============================================================
[2022-11-23 01:26:15,034] [ INFO] - Training Configuration Arguments
[2022-11-23 01:26:15,034] [ INFO] - paddle commit id :590b4dbcdd989324089ce43c22ef151c746c92a3
[2022-11-23 01:26:15,035] [ INFO] - _no_sync_in_gradient_accumulation:True
[2022-11-23 01:26:15,035] [ INFO] - activation_quantize_type :None
[2022-11-23 01:26:15,035] [ INFO] - adam_beta1 :0.9
[2022-11-23 01:26:15,036] [ INFO] - adam_beta2 :0.999
[2022-11-23 01:26:15,036] [ INFO] - adam_epsilon :1e-08
[2022-11-23 01:26:15,036] [ INFO] - algo_list :None
[2022-11-23 01:26:15,036] [ INFO] - batch_num_list :None
[2022-11-23 01:26:15,037] [ INFO] - batch_size_list :None
[2022-11-23 01:26:15,037] [ INFO] - bf16 :False
[2022-11-23 01:26:15,037] [ INFO] - bf16_full_eval :False
[2022-11-23 01:26:15,038] [ INFO] - bias_correction :False
[2022-11-23 01:26:15,039] [ INFO] - current_device :gpu:0
[2022-11-23 01:26:15,039] [ INFO] - dataloader_drop_last :False
[2022-11-23 01:26:15,040] [ INFO] - dataloader_num_workers :0
[2022-11-23 01:26:15,040] [ INFO] - device :gpu
[2022-11-23 01:26:15,041] [ INFO] - disable_tqdm :True
[2022-11-23 01:26:15,041] [ INFO] - do_compress :False
[2022-11-23 01:26:15,042] [ INFO] - do_eval :True
[2022-11-23 01:26:15,042] [ INFO] - do_export :True
[2022-11-23 01:26:15,042] [ INFO] - do_predict :False
[2022-11-23 01:26:15,043] [ INFO] - do_train :True
[2022-11-23 01:26:15,043] [ INFO] - eval_batch_size :32
[2022-11-23 01:26:15,044] [ INFO] - eval_steps :100
[2022-11-23 01:26:15,044] [ INFO] - evaluation_strategy :IntervalStrategy.STEPS
[2022-11-23 01:26:15,044] [ INFO] - fp16 :False
[2022-11-23 01:26:15,045] [ INFO] - fp16_full_eval :False
[2022-11-23 01:26:15,045] [ INFO] - fp16_opt_level :O1
[2022-11-23 01:26:15,046] [ INFO] - gradient_accumulation_steps :1
[2022-11-23 01:26:15,046] [ INFO] - greater_is_better :True
[2022-11-23 01:26:15,047] [ INFO] - ignore_data_skip :True
[2022-11-23 01:26:15,047] [ INFO] - input_infer_model_path :None
[2022-11-23 01:26:15,047] [ INFO] - label_names :['start_positions', 'end_positions']
[2022-11-23 01:26:15,047] [ INFO] - learning_rate :2e-05
[2022-11-23 01:26:15,048] [ INFO] - load_best_model_at_end :True
[2022-11-23 01:26:15,048] [ INFO] - local_process_index :0
[2022-11-23 01:26:15,049] [ INFO] - local_rank :0
[2022-11-23 01:26:15,049] [ INFO] - log_level :-1
[2022-11-23 01:26:15,049] [ INFO] - log_level_replica :-1
[2022-11-23 01:26:15,050] [ INFO] - log_on_each_node :True
[2022-11-23 01:26:15,050] [ INFO] - logging_dir :./finetuned_model_qq/runs/Nov23_01-24-57_6bf6d99aotq62-0
[2022-11-23 01:26:15,050] [ INFO] - logging_first_step :False
[2022-11-23 01:26:15,051] [ INFO] - logging_steps :20
[2022-11-23 01:26:15,051] [ INFO] - logging_strategy :IntervalStrategy.STEPS
[2022-11-23 01:26:15,052] [ INFO] - lr_scheduler_type :SchedulerType.CONSTANT
[2022-11-23 01:26:15,052] [ INFO] - max_grad_norm :1.0
[2022-11-23 01:26:15,053] [ INFO] - max_steps :-1
[2022-11-23 01:26:15,053] [ INFO] - metric_for_best_model :eval_f1
[2022-11-23 01:26:15,054] [ INFO] - minimum_eval_times :None
[2022-11-23 01:26:15,054] [ INFO] - moving_rate :0.9
[2022-11-23 01:26:15,054] [ INFO] - no_cuda :False
[2022-11-23 01:26:15,055] [ INFO] - num_train_epochs :50.0
[2022-11-23 01:26:15,055] [ INFO] - onnx_format :True
[2022-11-23 01:26:15,055] [ INFO] - optim :OptimizerNames.ADAMW
[2022-11-23 01:26:15,056] [ INFO] - output_dir :./finetuned_model_qq
[2022-11-23 01:26:15,056] [ INFO] - overwrite_output_dir :True
[2022-11-23 01:26:15,057] [ INFO] - past_index :-1
[2022-11-23 01:26:15,057] [ INFO] - per_device_eval_batch_size :32
[2022-11-23 01:26:15,057] [ INFO] - per_device_train_batch_size :32
[2022-11-23 01:26:15,058] [ INFO] - prediction_loss_only :False
[2022-11-23 01:26:15,058] [ INFO] - process_index :0
[2022-11-23 01:26:15,058] [ INFO] - recompute :False
[2022-11-23 01:26:15,059] [ INFO] - remove_unused_columns :True
[2022-11-23 01:26:15,059] [ INFO] - report_to :['visualdl']
[2022-11-23 01:26:15,059] [ INFO] - resume_from_checkpoint :None
[2022-11-23 01:26:15,060] [ INFO] - round_type :round
[2022-11-23 01:26:15,060] [ INFO] - run_name :./finetuned_model_qq
[2022-11-23 01:26:15,061] [ INFO] - save_on_each_node :False
[2022-11-23 01:26:15,061] [ INFO] - save_steps :100
[2022-11-23 01:26:15,061] [ INFO] - save_strategy :IntervalStrategy.STEPS
[2022-11-23 01:26:15,062] [ INFO] - save_total_limit :1
[2022-11-23 01:26:15,062] [ INFO] - scale_loss :32768
[2022-11-23 01:26:15,062] [ INFO] - seed :42
[2022-11-23 01:26:15,062] [ INFO] - sharding :[]
[2022-11-23 01:26:15,063] [ INFO] - sharding_degree :-1
[2022-11-23 01:26:15,063] [ INFO] - should_log :True
[2022-11-23 01:26:15,064] [ INFO] - should_save :True
[2022-11-23 01:26:15,064] [ INFO] - strategy :dynabert+ptq
[2022-11-23 01:26:15,064] [ INFO] - train_batch_size :32
[2022-11-23 01:26:15,065] [ INFO] - use_pact :True
[2022-11-23 01:26:15,065] [ INFO] - warmup_ratio :0.1
[2022-11-23 01:26:15,066] [ INFO] - warmup_steps :0
[2022-11-23 01:26:15,066] [ INFO] - weight_decay :0.0
[2022-11-23 01:26:15,066] [ INFO] - weight_quantize_type :channel_wise_abs_max
[2022-11-23 01:26:15,067] [ INFO] - width_mult_list :None
[2022-11-23 01:26:15,067] [ INFO] - world_size :4
[2022-11-23 01:26:15,067] [ INFO] -
[2022-11-23 01:26:15,155] [ INFO] - ***** Running training *****
[2022-11-23 01:26:15,155] [ INFO] - Num examples = 10631
[2022-11-23 01:26:15,156] [ INFO] - Num Epochs = 50
[2022-11-23 01:26:15,156] [ INFO] - Instantaneous batch size per device = 32
[2022-11-23 01:26:15,156] [ INFO] - Total train batch size (w. parallel, distributed & accumulation) = 128
[2022-11-23 01:26:15,157] [ INFO] - Gradient Accumulation steps = 1
[2022-11-23 01:26:15,157] [ INFO] - Total optimization steps = 4200.0
[2022-11-23 01:26:15,158] [ INFO] - Total num train samples = 531550.0
[2022-11-23 01:26:15,429] [ INFO] - Number of trainable parameters = 109485314
[2022-11-23 01:26:37,530] [ INFO] - loss: 0.00170259, learning_rate: 2e-05, global_step: 20, interval_runtime: 22.0947, interval_samples_per_second: 115.865, interval_steps_per_second: 0.905, epoch: 0.2381
[2022-11-23 01:26:58,366] [ INFO] - loss: 0.00105053, learning_rate: 2e-05, global_step: 40, interval_runtime: 20.8368, interval_samples_per_second: 122.86, interval_steps_per_second: 0.96, epoch: 0.4762
[2022-11-23 01:27:13,205] [ INFO] - loss: 0.00058154, learning_rate: 2e-05, global_step: 60, interval_runtime: 14.8386, interval_samples_per_second: 172.522, interval_steps_per_second: 1.348, epoch: 0.7143
[2022-11-23 01:27:34,129] [ INFO] - loss: 0.00063376, learning_rate: 2e-05, global_step: 80, interval_runtime: 20.924, interval_samples_per_second: 122.347, interval_steps_per_second: 0.956, epoch: 0.9524
[2022-11-23 01:27:54,536] [ INFO] - loss: 0.00043808, learning_rate: 2e-05, global_step: 100, interval_runtime: 20.4046, interval_samples_per_second: 125.462, interval_steps_per_second: 0.98, epoch: 1.1905
@ZHUI 在这里我设置的是constant,但用cosine也是一样的。
Name: paddlenlp
Version: 2.4.3
Summary: Easy-to-use and powerful NLP library with Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including Neural Search, Question Answering, Information Extraction and Sentiment Analysis end-to-end system.
Home-page: https://github.com/PaddlePaddle/PaddleNLP
Author: PaddleNLP Team
Author-email: paddlenlp@baidu.com
License: Apache 2.0
Location: /zhangxuejie373/anaconda3/envs/p38_paddle/lib/python3.8/site-packages
Requires: colorama, colorlog, datasets, dill, jieba, multiprocess, paddle2onnx, paddlefsl, protobuf, sentencepiece, seqeval, tqdm, visualdl
Required-by:
你日志里面的,是constant类型的lr_scheduler_type。请检查一下你的启动命令,参数的传入?
lr_scheduler_type :SchedulerType.CONSTANT
constant,学习率 不变化是符合预期的行为
[2022-11-23 05:35:27,615] [ INFO] - ============================================================
[2022-11-23 05:35:27,615] [ INFO] - Training Configuration Arguments
[2022-11-23 05:35:27,615] [ INFO] - paddle commit id :590b4dbcdd989324089ce43c22ef151c746c92a3
[2022-11-23 05:35:27,616] [ INFO] - _no_sync_in_gradient_accumulation:True
[2022-11-23 05:35:27,616] [ INFO] - activation_quantize_type :None
[2022-11-23 05:35:27,617] [ INFO] - adam_beta1 :0.9
[2022-11-23 05:35:27,617] [ INFO] - adam_beta2 :0.999
[2022-11-23 05:35:27,617] [ INFO] - adam_epsilon :1e-08
[2022-11-23 05:35:27,618] [ INFO] - algo_list :None
[2022-11-23 05:35:27,618] [ INFO] - batch_num_list :None
[2022-11-23 05:35:27,618] [ INFO] - batch_size_list :None
[2022-11-23 05:35:27,619] [ INFO] - bf16 :False
[2022-11-23 05:35:27,619] [ INFO] - bf16_full_eval :False
[2022-11-23 05:35:27,619] [ INFO] - bias_correction :False
[2022-11-23 05:35:27,620] [ INFO] - current_device :gpu:3
[2022-11-23 05:35:27,620] [ INFO] - dataloader_drop_last :False
[2022-11-23 05:35:27,620] [ INFO] - dataloader_num_workers :0
[2022-11-23 05:35:27,621] [ INFO] - device :gpu
[2022-11-23 05:35:27,621] [ INFO] - disable_tqdm :True
[2022-11-23 05:35:27,621] [ INFO] - do_compress :False
[2022-11-23 05:35:27,622] [ INFO] - do_eval :True
[2022-11-23 05:35:27,622] [ INFO] - do_export :True
[2022-11-23 05:35:27,622] [ INFO] - do_predict :False
[2022-11-23 05:35:27,623] [ INFO] - do_train :True
[2022-11-23 05:35:27,623] [ INFO] - eval_batch_size :32
[2022-11-23 05:35:27,623] [ INFO] - eval_steps :100
[2022-11-23 05:35:27,624] [ INFO] - evaluation_strategy :IntervalStrategy.STEPS
[2022-11-23 05:35:27,624] [ INFO] - fp16 :False
[2022-11-23 05:35:27,624] [ INFO] - fp16_full_eval :False
[2022-11-23 05:35:27,625] [ INFO] - fp16_opt_level :O1
[2022-11-23 05:35:27,625] [ INFO] - gradient_accumulation_steps :1
[2022-11-23 05:35:27,625] [ INFO] - greater_is_better :True
[2022-11-23 05:35:27,626] [ INFO] - ignore_data_skip :True
[2022-11-23 05:35:27,626] [ INFO] - input_infer_model_path :None
[2022-11-23 05:35:27,626] [ INFO] - label_names :['start_positions', 'end_positions']
[2022-11-23 05:35:27,627] [ INFO] - learning_rate :2e-05
[2022-11-23 05:35:27,627] [ INFO] - load_best_model_at_end :True
[2022-11-23 05:35:27,627] [ INFO] - local_process_index :0
[2022-11-23 05:35:27,628] [ INFO] - local_rank :-1
[2022-11-23 05:35:27,628] [ INFO] - log_level :-1
[2022-11-23 05:35:27,628] [ INFO] - log_level_replica :-1
[2022-11-23 05:35:27,629] [ INFO] - log_on_each_node :True
[2022-11-23 05:35:27,629] [ INFO] - logging_dir :./finetuned_model_qq/runs/Nov23_05-35-18_6bf6d99aotq62-0
[2022-11-23 05:35:27,630] [ INFO] - logging_first_step :False
[2022-11-23 05:35:27,630] [ INFO] - logging_steps :10
[2022-11-23 05:35:27,630] [ INFO] - logging_strategy :IntervalStrategy.STEPS
[2022-11-23 05:35:27,631] [ INFO] - lr_scheduler_type :SchedulerType.COSINE
[2022-11-23 05:35:27,631] [ INFO] - max_grad_norm :1.0
[2022-11-23 05:35:27,631] [ INFO] - max_steps :-1
[2022-11-23 05:35:27,631] [ INFO] - metric_for_best_model :eval_f1
[2022-11-23 05:35:27,632] [ INFO] - minimum_eval_times :None
[2022-11-23 05:35:27,632] [ INFO] - moving_rate :0.9
[2022-11-23 05:35:27,633] [ INFO] - no_cuda :False
[2022-11-23 05:35:27,633] [ INFO] - num_train_epochs :5.0
[2022-11-23 05:35:27,633] [ INFO] - onnx_format :True
[2022-11-23 05:35:27,634] [ INFO] - optim :OptimizerNames.ADAMW
[2022-11-23 05:35:27,634] [ INFO] - output_dir :./finetuned_model_qq
[2022-11-23 05:35:27,634] [ INFO] - overwrite_output_dir :True
[2022-11-23 05:35:27,635] [ INFO] - past_index :-1
[2022-11-23 05:35:27,635] [ INFO] - per_device_eval_batch_size :32
[2022-11-23 05:35:27,635] [ INFO] - per_device_train_batch_size :32
[2022-11-23 05:35:27,635] [ INFO] - prediction_loss_only :False
[2022-11-23 05:35:27,636] [ INFO] - process_index :0
[2022-11-23 05:35:27,636] [ INFO] - recompute :False
[2022-11-23 05:35:27,636] [ INFO] - remove_unused_columns :True
[2022-11-23 05:35:27,637] [ INFO] - report_to :['visualdl']
[2022-11-23 05:35:27,637] [ INFO] - resume_from_checkpoint :None
[2022-11-23 05:35:27,637] [ INFO] - round_type :round
[2022-11-23 05:35:27,638] [ INFO] - run_name :./finetuned_model_qq
[2022-11-23 05:35:27,638] [ INFO] - save_on_each_node :False
[2022-11-23 05:35:27,638] [ INFO] - save_steps :100
[2022-11-23 05:35:27,639] [ INFO] - save_strategy :IntervalStrategy.STEPS
[2022-11-23 05:35:27,639] [ INFO] - save_total_limit :1
[2022-11-23 05:35:27,639] [ INFO] - scale_loss :32768
[2022-11-23 05:35:27,640] [ INFO] - seed :42
[2022-11-23 05:35:27,640] [ INFO] - sharding :[]
[2022-11-23 05:35:27,640] [ INFO] - sharding_degree :-1
[2022-11-23 05:35:27,641] [ INFO] - should_log :True
[2022-11-23 05:35:27,641] [ INFO] - should_save :True
[2022-11-23 05:35:27,641] [ INFO] - strategy :dynabert+ptq
[2022-11-23 05:35:27,642] [ INFO] - train_batch_size :32
[2022-11-23 05:35:27,642] [ INFO] - use_pact :True
[2022-11-23 05:35:27,642] [ INFO] - warmup_ratio :0.1
[2022-11-23 05:35:27,643] [ INFO] - warmup_steps :0
[2022-11-23 05:35:27,643] [ INFO] - weight_decay :0.0
[2022-11-23 05:35:27,643] [ INFO] - weight_quantize_type :channel_wise_abs_max
[2022-11-23 05:35:27,644] [ INFO] - width_mult_list :None
[2022-11-23 05:35:27,644] [ INFO] - world_size :1
[2022-11-23 05:35:27,644] [ INFO] -
[2022-11-23 05:35:27,646] [ INFO] - ***** Running training *****
[2022-11-23 05:35:27,646] [ INFO] - Num examples = 10631
[2022-11-23 05:35:27,647] [ INFO] - Num Epochs = 5
[2022-11-23 05:35:27,647] [ INFO] - Instantaneous batch size per device = 32
[2022-11-23 05:35:27,647] [ INFO] - Total train batch size (w. parallel, distributed & accumulation) = 32
[2022-11-23 05:35:27,648] [ INFO] - Gradient Accumulation steps = 1
[2022-11-23 05:35:27,648] [ INFO] - Total optimization steps = 1665.0
[2022-11-23 05:35:27,648] [ INFO] - Total num train samples = 53155.0
[2022-11-23 05:35:27,846] [ INFO] - Number of trainable parameters = 109485314
[2022-11-23 05:35:38,718] [ INFO] - loss: 0.00220406, learning_rate: 2e-05, global_step: 10, interval_runtime: 10.8645, interval_samples_per_second: 29.454, interval_steps_per_second: 0.92, epoch: 0.03
[2022-11-23 05:35:48,959] [ INFO] - loss: 0.00207934, learning_rate: 2e-05, global_step: 20, interval_runtime: 10.2417, interval_samples_per_second: 31.245, interval_steps_per_second: 0.976, epoch: 0.0601
[2022-11-23 05:35:59,216] [ INFO] - loss: 0.00126031, learning_rate: 2e-05, global_step: 30, interval_runtime: 10.2564, interval_samples_per_second: 31.2, interval_steps_per_second: 0.975, epoch: 0.0901
[2022-11-23 05:36:03,783] [ INFO] - loss: 0.0013646, learning_rate: 2e-05, global_step: 40, interval_runtime: 4.5671, interval_samples_per_second: 70.066, interval_steps_per_second: 2.19, epoch: 0.1201
[2022-11-23 05:36:14,029] [ INFO] - loss: 0.00103009, learning_rate: 2e-05, global_step: 50, interval_runtime: 10.2461, interval_samples_per_second: 31.231, interval_steps_per_second: 0.976, epoch: 0.1502
[2022-11-23 05:36:24,288] [ INFO] - loss: 0.00078573, learning_rate: 2e-05, global_step: 60, interval_runtime: 10.2587, interval_samples_per_second: 31.193, interval_steps_per_second: 0.975, epoch: 0.1802
[2022-11-23 05:36:34,552] [ INFO] - loss: 0.00088691, learning_rate: 2e-05, global_step: 70, interval_runtime: 10.2639, interval_samples_per_second: 31.177, interval_steps_per_second: 0.974, epoch: 0.2102
lr_scheduler_type :SchedulerType.COSINE 当前COSINE 依旧保持了learning_rate不变 T-T @ZHUI
你这个是模型压缩的api吗?我看用了 dynabert+ptq
@LiuChiachi 辛苦看看
可以贴一下您使用的代码链接给我们排查
如果是 UIE finetune的代码的话,可以删除这一行:
可以了。感谢!!! 这行代码后续也会删除吗? 之所以能够直接删除是因为Trainer默认了一个优化器的吗?
对的,后续会删除的。Trainer 需要自己控制 optimizer,不然很多功能无法生效。
@LiuChiachi 辛苦后续删除一下。
请提出你的问题
在使用UIE的Finetune.py脚本时,传入lr_scheduler_type cosine 后监控lr 并无变化 ,请问是或正常?