Open visatish opened 4 years ago
Also another implementation question: In the paper you state that the fusion from lower resolution to higher resolution is accomplished with bilinear upsampling followed by a 1x1 convolution. However, in the actual implementation it seems like this is reversed as seen here. Can you clarify this?
multi_scale_output = false means only returning the high-resolution output (adopted in pose estimation). multi_scale_output = true means returning all the four output (adopted in segmentation).
We do 1x1 convolution before bilinear upsampling, which can reduce the computation complexity.
Interesting okay, I was a bit confused because even when multi_scale_output = False
, you do combine representations from the other branches in the fuse layer for the single output that is returned. It seems that the only real difference that there is one less layer of "mixing" achieved by discarding the other outputs, which would be combined further down the road. Do you find that this extra layer makes a significant difference in performance for each task?
Thank you very much for your great job!
Are the following perceptions correct?
multi_scale_output = False : HRNetV1
multi_scale_output = True : HRNetV2
reference:
@mucunwuxian Yes.
@visatish For pose estimation, we find that combining 4 representations is slightly better than using only the high resolution representation (0.2 points gain on COCO val).
For semantic segmentation and object detection, combining 4 representations is helpful for handling diverse object scales and more categories.
@sunke123 Thank you for your reply!
I understood, It's very helpful!💡
Hi,
I was looking through the codebase and noticed this comment that multi-scale outputs (to be honest I am not 100% sure what these are, and it would be helpful if you could clarify this - i.e. do they have any meaning in the context of the paper, or are they just a helpful argument for bookkeeping implementation-wise) are only used for the final module in a stage. However, this does not seem to be true for the actual implementation because it seems that the
multi_scale_output
arg to_make_stage
is alwaysTrue
, meaning that this if-statement will always evaluate toFalse
, andreset_multi_scale_output
will always beTrue
, no matter whether or not it is the last module.Could you clarify why this is the case and what exactly
reset_multi_scale_output
is doing? Maybe I am misunderstanding something.Thanks, Vishal