AngeLouCN / CaraNet

Context Axial Reverse Attention Network for Small Medical Objects Segmentation
462 stars 30 forks source link

Why add a PRELU output to a linear output? #23

Closed JohnMBrandt closed 1 year ago

JohnMBrandt commented 2 years ago

The output of the partial decoder (https://github.com/AngeLouCN/CaraNet/blob/main/lib/partial_decoder.py#L30) has a linear activation. The output of each of the axial attention modules, which are designed for residual learning, go through a BN-PReLU (https://github.com/AngeLouCN/CaraNet/blob/main/CaraNet.py#L47)

The output (decoder + axial transformer 1, 2, 3, 4) then gets a sigmoid activation to generate class probabilities.

Why modify the original linear output (from the partial decoder) with a nonlinear function that biases positive (PRELU)? Doesn't this mean that you're more likely to saturate the sigmoid by having a large input? Or at the very least result in exploding biases for the partial decoder?

My understanding of residual learning is that it's commonly done with no activation functions prior to the summation to prevent exploding biases (continually adding a positive value to a positive value)

AngeLouCN commented 2 years ago

Hi thank you for your interest. We directly use the CFP module code from our previous work which is a multi-class semantic segmentation task. In that case, PRELU has better performance than RELU.