As said in your paper, the nonlocal block is initialized as zero weights, while with its shortcut connection, it will not influence the original network. But in your implementation, nonlocal block is initialized just the same as normal conv, as shown in code
__C.NONLOCAL.USE_ZERO_INIT_CONV = False
__C.NONLOCAL.CONV_INIT_STD = 0.01
Is there any difference in performance between these two implementations?
As said in your paper, the nonlocal block is initialized as zero weights, while with its shortcut connection, it will not influence the original network. But in your implementation, nonlocal block is initialized just the same as normal conv, as shown in code __C.NONLOCAL.USE_ZERO_INIT_CONV = False __C.NONLOCAL.CONV_INIT_STD = 0.01 Is there any difference in performance between these two implementations?