Closed ShawnXuan closed 1 month ago
[09/10 10:37:57 libai]: >>> done with building model. Building time: 0.282 seconds
WARNING [09/10 10:37:57 lb.scheduler.lr_scheduler]: warmup iters equals to zero, return CosineLR
[09/10 10:38:03 lb.engine.trainer]: Starting training from iteration 0
[09/10 10:40:56 lb.utils.events]: eta: 21:00:38 iteration: 19/10000 consumed_samples: 80 total_loss: 9.895 time: 7.5187 s/iter data_time: 0.0021 s/iter total_throughput: 0.53 samples/s lr: 1.50e-04
[09/10 10:43:32 lb.utils.events]: eta: 21:05:47 iteration: 39/10000 consumed_samples: 160 total_loss: 9.027 time: 7.6572 s/iter data_time: 0.0019 s/iter total_throughput: 0.52 samples/s lr: 1.50e-04
[09/10 10:46:05 lb.utils.events]: eta: 21:06:05 iteration: 59/10000 consumed_samples: 240 total_loss: 8.362 time: 7.6549 s/iter data_time: 0.0015 s/iter total_throughput: 0.52 samples/s lr: 1.50e-04
[09/10 10:48:42 lb.utils.events]: eta: 21:08:55 iteration: 79/10000 consumed_samples: 320 total_loss: 7.847 time: 7.7127 s/iter data_time: 0.0013 s/iter total_throughput: 0.52 samples/s lr: 1.50e-04
[09/10 10:51:22 lb.utils.events]: eta: 21:18:52 iteration: 99/10000 consumed_samples: 400 total_loss: 7.628 time: 7.7640 s/iter data_time: 0.0013 s/iter total_throughput: 0.52 samples/s lr: 1.50e-04
[09/10 10:53:53 lb.utils.events]: eta: 21:04:10 iteration: 119/10000 consumed_samples: 480 total_loss: 7.441 time: 7.7314 s/iter data_time: 0.0013 s/iter total_throughput: 0.52 samples/s lr: 1.50e-04
[09/10 10:50:47 libai]: >>> done with building model. Building time: 5.722 seconds
WARNING [09/10 10:50:47 lb.scheduler.lr_scheduler]: warmup iters equals to zero, return CosineLR
[09/10 10:50:50 lb.engine.trainer]: Starting training from iteration 0
[09/10 10:50:54 lb.utils.events]: eta: 0:10:15 iteration: 19/10000 consumed_samples: 80 total_loss: 9.83 time: 0.0689 s/iter data_time: 0.0008 s/iter total_throughput: 58.05 samples/s lr: 1.50e-04
[09/10 10:50:58 lb.utils.events]: eta: 0:10:15 iteration: 39/10000 consumed_samples: 160 total_loss: 9.122 time: 0.1458 s/iter data_time: 0.0007 s/iter total_throughput: 27.43 samples/s lr: 1.50e-04
[09/10 10:51:00 lb.utils.events]: eta: 0:10:12 iteration: 59/10000 consumed_samples: 240 total_loss: 8.388 time: 0.1214 s/iter data_time: 0.0007 s/iter total_throughput: 32.94 samples/s lr: 1.50e-04
[09/10 10:51:03 lb.utils.events]: eta: 0:10:11 iteration: 79/10000 consumed_samples: 320 total_loss: 8.019 time: 0.1357 s/iter data_time: 0.0008 s/iter total_throughput: 29.48 samples/s lr: 1.50e-04
[09/10 10:51:05 lb.utils.events]: eta: 0:10:09 iteration: 99/10000 consumed_samples: 400 total_loss: 7.635 time: 0.1232 s/iter data_time: 0.0008 s/iter total_throughput: 32.47 samples/s lr: 1.50e-04
[09/10 10:51:06 lb.utils.events]: eta: 0:10:09 iteration: 119/10000 consumed_samples: 480 total_loss: 7.461 time: 0.1132 s/iter data_time: 0.0008 s/iter total_throughput: 35.34 samples/s lr: 1.50e-04
[09/10 10:51:08 lb.utils.events]: eta: 0:10:09 iteration: 139/10000 consumed_samples: 560 total_loss: 7.367 time: 0.1061 s/iter data_time: 0.0009 s/iter total_throughput: 37.72 samples/s lr: 1.50e-04
[09/10 10:51:09 lb.utils.events]: eta: 0:10:06 iteration: 159/10000 consumed_samples: 640 total_loss: 7.305 time: 0.1003 s/iter data_time: 0.0008 s/iter total_throughput: 39.88 samples/s lr: 1.50e-04
[09/10 10:51:10 lb.utils.events]: eta: 0:10:04 iteration: 179/10000 consumed_samples: 720 total_loss: 7.214 time: 0.0975 s/iter data_time: 0.0008 s/iter total_throughput: 41.02 samples/s lr: 1.50e-04
[09/10 10:51:12 lb.utils.events]: eta: 0:10:03 iteration: 199/10000 consumed_samples: 800 total_loss: 7.132 time: 0.0940 s/iter data_time: 0.0007 s/iter total_throughput: 42.55 samples/s lr: 1.50e-04
[09/10 10:51:13 lb.utils.events]: eta: 0:10:02 iteration: 219/10000 consumed_samples: 880 total_loss: 6.986 time: 0.0911 s/iter data_time: 0.0008 s/iter total_throughput: 43.93 samples/s lr: 1.50e-04
[09/10 10:51:14 lb.utils.events]: eta: 0:10:01 iteration: 239/10000 consumed_samples: 960 total_loss: 6.866 time: 0.0886 s/iter data_time: 0.0009 s/iter total_throughput: 45.15 samples/s lr: 1.50e-04
[09/10 10:51:18 lb.utils.events]: eta: 0:10:00 iteration: 259/10000 consumed_samples: 1040 total_loss: 6.764 time: 0.0958 s/iter data_time: 0.0008 s/iter total_throughput: 41.74 samples/s lr: 1.50e-04
[09/10 10:51:19 lb.utils.events]: eta: 0:09:58 iteration: 279/10000 consumed_samples: 1120 total_loss: 6.655 time: 0.0933 s/iter data_time: 0.0008 s/iter total_throughput: 42.85 samples/s lr: 1.50e-04
Getting Started
Prepare the Data and Vocabulary
Ensure the correct location of
gpt_data
:configs/gpt2_pretrain.py
Adjust the following configuration based on your specific environment:
How to Train gpt2 Model with NPU/XPU
If you want to train on XPU, please change 'npu' to 'xpu'.