feinanshan / FANet

MIT License
51 stars 5 forks source link

class BatchNorm2D in fanet.py #5

Open TranThanh96 opened 4 years ago

TranThanh96 commented 4 years ago

I am confusing about it! You defined batchnorm, but in the forward function, you only use activation, can you explain this?

` class BatchNorm2d(nn.BatchNorm2d):

(conv => BN => ReLU) * 2

def __init__(self, num_features, activation='none'):
    super(BatchNorm2d, self).__init__(num_features=num_features)
    if activation == 'leaky_relu':
        self.activation = nn.LeakyReLU()
    elif activation == 'none':
        self.activation = lambda x:x
    else:
        raise Exception("Accepted activation: ['leaky_relu']")

def forward(self, x):
    return self.activation(x)

`

Eashwar93 commented 3 years ago

I am not very sure, but on analyzing the model with Netron, I can see BN layers. Probably overriding of forward function just appends the activation(LeakyReLU) after the BN.

mehtadushy commented 3 years ago

@Eashwar93 If the last part of what you said were true, it would be a major Pytorch bug. The derived module carries the parameters of batch norm, but the forward call is overridden.

mehtadushy commented 3 years ago

@feinanshan This bug raises suspicions about the correctness of the rest of the implementation too vs. the model actually used to generate the results for the paper. It would be very helpful if you could release a pre-trained model.

feinanshan commented 3 years ago

Hi, this code is just for speed evaluation, so the BN layers are removed as they can be merged into the Conv layers before them.

mehtadushy commented 3 years ago

There is a different bug here then. The bias terms in the convs being fused with BN should be true.

feinanshan commented 3 years ago

Yes, it should be. Yet that doesn't affect the efficiency.

Eashwar93 commented 3 years ago

@Eashwar93 If the last part of what you said were true, it would be a major Pytorch bug. The derived module carries the parameters of batch norm, but the forward call is overridden.

@mehtadushy I am pretty sure you are right. Visualizing the BN layers in Netron confused me. Thanks for the clarification.

Eashwar93 commented 3 years ago

Hi, this code is just for speed evaluation, so the BN layers are removed as they can be merged into the Conv layers before them.

@pinghu6 It would be nice if you could share a trainable model if you can't provide us with a trained model. If that is not possible either could you please clarify if there are any other aspects that I need to take care while I can try developing a trainable model based on your test model.

It is also a bit worrying that the BiseNet model speed is tested with the BN layers and FANet is tested without BN layers Thanks once again!!

mehtadushy commented 3 years ago

@Eashwar93 BiseNet also has the same BN override

Eashwar93 commented 3 years ago

@mehtadushy I am a bit new to this world of Deep Learning and excuse me for my mistakes. I was being careless and I didn't see the Class definition.