Closed wang-tf closed 4 years ago
In the paper, the authors stated:
Since, as pointed out in [12], conv4 3 has a different feature scale compared to the other layers, we use the L2 normalization technique introduced in [12] to scale the feature norm at each location in the feature map to 20 and learn the scale during back propagation.
Because there weren't any equivalent implementations for Tensorflow, I just used a BatchNorm layer instead. I tried to implement L2 normalization layer on my own, but the loss didn't converge as expected.
In my opinion, there is no information about batch normalization in paper. And there is only first feature layer used this operation in code. Is that useful ? Can we use it for every feature layer before compute_heads ?
https://github.com/ChunML/ssd-tf2/blob/53e481ade7016c6d83e008a84f7cd18e59c75242/network.py#L92