About using Automatic Mixed Precision

caiyuanhao1998 / Retinexformer

"Retinexformer: One-stage Retinex-based Transformer for Low-light Image Enhancement" (ICCV 2023) & (NTIRE 2024 Challenge)

https://arxiv.org/abs/2303.06705

MIT License

828 stars 64 forks source link

About using Automatic Mixed Precision #93

Closed Koruvika closed 2 months ago

Koruvika commented 2 months ago

Because of limited resources, I tried to apply Retinexformer on LOLv2 with use_amp=true (the rest configs similar to RetinexFormer_LOL_v2_real.yml), but loss and psnr maybe not improved, did you try this config and have some recommend to use AMP?

Koruvika commented 2 months ago

W B Chart 7_11_2024, 11_58_23 AM

Koruvika commented 2 months ago

W B Chart 7_11_2024, 11_58_35 AM (1)

caiyuanhao1998 commented 2 months ago

I suggest you use the 'Retinexformer' conda env to reproduce the results on the LOL-v2-real. The amp is for the 'torch2' conda env. I haven't tried using amp on the LOL-v2. Instead, I just used it on the NTIRE 2024 datasets where the images are 4K x 6K.

Koruvika commented 2 months ago

I just tried training again on LOLv2, the reason was not AMP, I accidentally set weight-decay=0.0001 so the model did not improve. When weight-decay=0, model operates normally regardless of whether AMP is used or not. :))) Hmm...I saw you commented the weight-decay line in config files, maybe you have tried configs with it before @caiyuanhao1998

liaaabegin commented 1 month ago

I have also met this problem that the loss does not decrease on the NTIRE dataset. But the weight_decay is commented so maybe my problem is not because of that. I set the patch size to be 256×256 with batch size being 4 and the rest of config remains similar to RetinexFormer_NTIRE.yml and trained for 10,000 iteration. The L1loss kept around 6e-2. I have also tried the LOL_v2_real dataset but the phenomenon is similar. I am wondering is there any suggestions on this?