Open taosean opened 4 years ago
"I wonder, are these params related?" Yes, MODEL.USE_AFFINE=True will convert BN layers to "affine" layers and effectively this freezes the BN layers. And that's why we need to set "CHECKPOINT.CONVERT_MODEL=True" to convert the weights of BN layers into a format that can be used by the affine layers. (See further reply below for why/when we want to use it. )
NONLOCAL.USE_BN and NONLOCAL.USE_AFFINE means slightly different things. Please see https://github.com/facebookresearch/video-long-term-feature-banks/blob/master/lib/models/nonlocal_helper.py#L146 for the exact implementation.
"According to my understanding, if MODEL.USE_AFFINE=False (which means using SpatialBN), then CHECKPOINT.CONVERT_MODEL should be set as False. Is my understanding right?" Yes
"I see SpatialBN is 2d BN, can it be used in the model with 3d convolution?" Yes, for example, the 3D Conv at https://github.com/facebookresearch/video-long-term-feature-banks/blob/master/lib/models/model_builder_video.py#L176 uses the SpatialBN operator.
"If I want to finetue this converted model, should I finetune it with MODEL.USE_AFFINE True or False?" The reason for freezing BN (by setting USE_AFFINE=True) is that our batch size per GPU is small, so BN doesn't work well. If with your new model, your batch size is large enough (e.g. 8 per GPU), I think it'll work better with BN turned on (USE_AFFINE=False). If your batch size is mall ( < 4 per GPU), I'd guess using "CHECKPOINT.CONVERT_MODEL True" to convert the BN layers into frozen "affine layers" and train the frozen BNs by setting "USE_AFFINE=True" would work better.
"I ported a network from Pytorch to Caffe2 and converted the Pytorch version weight file to Caffe2 version weight, however, I cannot get the same result" I recommend double check and verify that the architecture defined in your PyTorch model is exactly the same the architecture defined in this repo (including details like striding, pooling size, etc. ). Our architecture is slightly different from the original non-local network (See also https://arxiv.org/pdf/1812.05038.pdf Appendix A).
Hi, @chaoyuaw , it's very nice of you to respond to my questions, thank you very much.
I have another question though, if I finetune the model with BN enabled, which means I set
MODEL:
USE_BN: True
USE_AFFINE: False
CHECKPOINT:
CONVERT_MODEL: False
how should I set these 2 NONLOCAL
related parameters?
NONLOCAL:
USE_BN: False or True?
USE_AFFINE: True or False?
Are USE_BN
and USE_AFFINE
parameters related to their counterparts in cfg.MODEL
section?
Thanks!
If your original model uses a BN layer in NL and you don't want to freeze it, you set NONLOCAL.USE_BN: True and NONLOCAL.USE_AFFINE: False
I recommend taking a look at https://github.com/facebookresearch/video-long-term-feature-banks/blob/master/lib/models/nonlocal_helper.py#L146 to see exactly what these options imply.
Thanks @chaoyuaw , I understand, thank you.
Hi @chaoyuaw, sorry to bother you, I have some confusions about SpatialBN layer in this repo.
I see in config files, these params are set as
I wonder, are these params related?
According to my understanding, if
MODEL.USE_AFFINE=False
(which means using SpatialBN), thenCHECKPOINT.CONVERT_MODEL
should be set asFalse
. Is my understanding right?I ported a network from Pytorch to Caffe2 and converted the Pytorch version weight file to Caffe2 version weight, however, I cannot get the same result as in Pytorch version from the converted weight file. (The pytorch model is trained with 3d BN)
I suppose this have something to do with BatchNorm operations. I see SpatialBN is 2d BN, can it be used in the model with 3d convolution?
If I want to finetue this converted model, should I finetune it with
MODEL.USE_AFFINE
True
orFalse
?Thanks!