The OCR approach is rephrased as Segmentation Transformer: https://arxiv.org/abs/1909.11065. This is an official implementation of semantic segmentation for HRNet. https://arxiv.org/abs/1908.07919
Question:
According to this line,
probs = F.softmax(self.scale * probs, dim=2)# batch x k x hw
In this code, the input dimension is [batch_size, num_class, fh*fw].
And the softmax dimension is 2, which means that the summation of the dimensions of the feature map (fh*fw) is one.
However, in my opinion, I thinke the softmax dimension should be 1 to make the summation of the dimension of the num_class (num_class) is one.
The corrected code is as follows:
probs = F.softmax(self.scale * probs, dim=1)# batch x num_class x hw
In this line : https://github.com/HRNet/HRNet-Semantic-Segmentation/blob/HRNet-OCR/lib/models/seg_hrnet_ocr.py#L64
Question: According to this line,
probs = F.softmax(self.scale * probs, dim=2)# batch x k x hw
In this code, the input dimension is [batch_size, num_class, fh*fw]. And the softmax dimension is 2, which means that the summation of the dimensions of the feature map (fh*fw) is one.
However, in my opinion, I thinke the softmax dimension should be 1 to make the summation of the dimension of the num_class (num_class) is one.
The corrected code is as follows:
probs = F.softmax(self.scale * probs, dim=1)# batch x num_class x hw