Closed kazuto1011 closed 1 year ago
oops, this issue is quite old. sorry.
As far as I remember: detaching or not might not make a big difference in performance, however, Andres mentioned that he had better experiences when the whole network is learned (instead of just skipping the middle part), when he adds the detach
. But I'm not sure if there is a better explanation.
Thank you for your response. Okay I understand you adopted detach() empirically. I totally thought it might bring inefficiency or instability when stopping gradient flows in skip connections.
Thank you for sharing your codes. I found that all backbones call
detach()
in skip-connections. For example:https://github.com/PRBonn/lidar-bonnetal/blob/5a5f4b180117b08879ec97a3a05a3838bce6bb0f/train/backbones/squeezesegV2.py#L156
Could you tell me where this idea is from? I cannot find the corresponding part in the official SqueezeSeg/SqueezeSegV2. Besides, I'm concerned that the first
detach()
in SqueezeSegV2 is not what is expected.https://github.com/PRBonn/lidar-bonnetal/blob/5a5f4b180117b08879ec97a3a05a3838bce6bb0f/train/backbones/squeezesegV2.py#L170-L174
skip_in
is detached and never referenced afterward so that theself.conv1b
layer never receives gradients to update themselves. Here is the quick check I did.The above snippet gives: