[Targeting 2024 Q4] ExponentialMovingAverage does not work with fleet DistributedStrategy

PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice （『飞桨』核心框架，深度学习&机器学习高性能单机、分布式训练和跨平台部署）

http://www.paddlepaddle.org/

Apache License 2.0

21.8k stars 5.47k forks source link

Open Tom-Zheng opened 1 year ago

Tom-Zheng commented 1 year ago

这样做副作用很大，会完全禁用DistributedStrategy，导致EMA无法和其他优化项共存，应该被视为Bug.

No response

ForFishes commented 1 year ago

您好，EMA暂时只是支持在纯program下运行。其他策略下，需要额外的适配EMA。

Tom-Zheng commented 1 year ago

我们目前在PPYOLOE+的优化中需要用到。如果不开启EMA，会导致AP下降0.7% (53.5 -> 52.8%). 请考虑是否需要支持。

LiYuRio commented 1 year ago

这行的作用只是不用ParallelExecutor执行program，采用原始的executor。

请问是在静态图下吗，而且需要用ParallelExecutor做图优化？现在框架里已用新执行器代替ParallelExecutor，性能基本持平，能再详细说一下使用场景？

Tom-Zheng commented 1 year ago

根据之前的讨论，此issue和Paddle执行机制相关，开启EMA会导致 ir graph pass不被执行，需要找相关负责人修复。cc: @LiYuRio

jeng1220 commented 1 year ago

等 Q3 後再討論此事

onecatcn commented 5 months ago

since the ppyoloe project is on hold, we will check the issue in 24H2

Tom-Zheng commented 1 month ago

Move to Q4