LeapLabTHU / FLatten-Transformer

Official repository of FLatten Transformer (ICCV2023)
388 stars 23 forks source link

迁移到下游任务,训练不收敛问题 #5

Closed CodeStarting-design closed 1 year ago

CodeStarting-design commented 1 year ago

感觉您杰出的工作。 我尝试将您在论文中所提出的聚焦线性注意力模块,迁移到下游的去雾任务中,但是在模型的训练过程中,出现模型参数为nan的问题,然而使用swin替换FLatten模块则不会出现这样的问题,希望能够得到您的解答!

tian-qing001 commented 1 year ago

Hi @CodeStarting-design, you can consider turning off amp, which sometimes leads to nan on certain devices. Moreover, setting grad_clip may also helps.

tian-qing001 commented 4 months ago

Hi @CodeStarting-design, we have fixed the numerical instability problem. Now the models can be trained with auto mixed precision (amp) in downstream tasks without encountering nan.