Oneflow-Inc / models

Models and examples built with OneFlow
Apache License 2.0
94 stars 37 forks source link

add dcn 40M amp script #370

Open MARD1NO opened 2 years ago

MARD1NO commented 2 years ago

与fp32区别主要是:

lr -> 0.003 adam的eps增大到1e-4 lr_scheduler做了一些改动

Rank[0], Epoch 7, Step 70000, AUC 0.795594, LogLoss 0.123677, Eval_time 24.48 s, Metrics_time 4.56 s, Eval_samples 89140000, GPU_Memory 14512 MiB, Host_Memory 12285 MiB, 2022-07-27 10:25:39
Rank[0], Step 71000, Loss 0.1217, Latency 9.213 ms, Throughput 6001880.3, 2022-07-27 10:25:48
Rank[0], Step 72000, Loss 0.1199, Latency 9.162 ms, Throughput 6035043.0, 2022-07-27 10:25:57
Rank[0], Step 73000, Loss 0.1265, Latency 9.307 ms, Throughput 5941121.9, 2022-07-27 10:26:07
Rank[0], Step 74000, Loss 0.1234, Latency 9.534 ms, Throughput 5799602.2, 2022-07-27 10:26:16
Rank[0], Step 75000, Loss 0.1247, Latency 9.194 ms, Throughput 6014392.4, 2022-07-27 10:26:25
================ Test Evaluation ================
Rank[0], Epoch 7, Step 75000, AUC 0.801010, LogLoss 0.126012, Eval_time 19.27 s, Metrics_time 4.80 s, Eval_samples 89140000, GPU_Memory 14512 MiB, Host_Memory 9733 MiB, 2022-07-27 10:26:50

在fp32脚本上调小lr也可以,DCN AMP lr=0.001

================ Test Evaluation ================
Rank[0], Step 75000, AUC 0.803214, LogLoss 0.000000, Eval_time 2.04 s, Metrics_time 2.24 s, Eval_samples 89192448, GPU_Memory 14507 MiB, Host_Memory 6214 MiB, 2022-07-27 15:47:50