Closed CodeStarting-design closed 1 year ago
Hi @CodeStarting-design, you can consider turning off amp
, which sometimes leads to nan
on certain devices. Moreover, setting grad_clip
may also helps.
Hi @CodeStarting-design, we have fixed the numerical instability problem. Now the models can be trained with auto mixed precision (amp) in downstream tasks without encountering nan.
感觉您杰出的工作。 我尝试将您在论文中所提出的聚焦线性注意力模块,迁移到下游的去雾任务中,但是在模型的训练过程中,出现模型参数为nan的问题,然而使用swin替换FLatten模块则不会出现这样的问题,希望能够得到您的解答!