Closed xiezipeng-ML closed 2 years ago
需要做一个测评,看下性能变化的指标,再合并
# num_layers=6
修改后:
显存:2459MiB
[10/28 06:00:17 libai]: >>> done with building model. Building time: 0.902 seconds
[10/28 06:03:40 lb.utils.events]: eta: 0:51:33 iteration: 849/24000 consumed_samples: 6800 total_loss: 3.505 time: 0.1323 s/iter data_time: 0.0114 s/iter total_throughput: 60.49 samples/s lr: 8.49e-05
修改前:
显存:2459MiB
done with building model. Building time: 1.366 seconds
[10/28 05:59:29 lb.utils.events]: eta: 0:51:30 iteration: 849/24000 consumed_samples: 6800 total_loss: 3.513 time: 0.1322 s/iter data_time: 0.0118 s/iter total_throughput: 60.51 samples/s lr: 8.49e-05
# num_layers=12
修改后:
3587MiB
[10/28 06:06:15 libai]: >>> done with building model. Building time: 1.312 seconds
[10/28 06:12:40 lb.utils.events]: eta: 1:37:56 iteration: 849/24000 consumed_samples: 6800 total_loss: 3.514 time: 0.2518 s/iter data_time: 0.0117 s/iter total_throughput: 31.77 samples/s lr: 8.49e-05
修改前:
3587MiB
[10/28 06:13:43 libai]: >>> done with building model. Building time: 1.555 seconds
[10/28 06:20:14 lb.utils.events]: eta: 1:39:01 iteration: 849/24000 consumed_samples: 6800 total_loss: 3.514 time: 0.2579 s/iter data_time: 0.0176 s/iter total_throughput: 31.03 samples/s lr: 8.49e-05
@strint 可能这个改变的影响有限
https://github.com/Oneflow-Inc/libai/issues/406#issuecomment-1292151939
根据wenxiao的这个refine一下mt5的compute_bias中的to_global位置