Created a UNET which is almost the same as NAFNEt without channel attentions, layer normalization and simple gate is replaced by a ReLU
Please beware that the 2 networks have the same amount of parameters but the weird fact is that since the gate activation inherently reduces channels by two, I had to adapt the amount of channels outputs used in the convolution precedig the gate -> not really deptwise.
Ability to train NAFNet
NAFNet detail
Important to know: NAFNet configuration files provide the correct configuration (during their training) https://github.com/megvii-research/NAFNet/blob/main/options/train/GoPro/NAFNet-width64.yml
UNET