Open massica17 opened 1 year ago
“we use batch normalization instead of layer normalization as we found batch normalization gains more for the segmentation performance.” I can not understand why,I think maybe it is because you substitute self-attentin for Conv。But I am not sure
“we use batch normalization instead of layer normalization as we found batch normalization gains more for the segmentation performance.” I can not understand why,I think maybe it is because you substitute self-attentin for Conv。But I am not sure