Closed lartpang closed 2 years ago
Hi pang, the 'crop_feature' only contains the features of the foreground since the unrelated parts have been eliminated by seeds (see Line 56-57).
@chenqi1126
Thank you for your reply.
Although the background is eliminated by seed here, this only corresponds to the calculation process of the numerator in the formula. But the denominator is calculated differently than the formula. According to the meaning of the formula, mask average pooling only calculates the average value on a specific area (R^i_k=1), while in the current implementation, it is calculated on the entire feature, which leads to a larger denominator.
@lartpang
Thanks for your attention.
Yes - It is a little different between formula and implementation. As you mentioned, the implementation would lead to a larger denominator and produce a scaled value compared to the formula. But this scaled value does not affect the final result since it is normalized in the following similarity calculation. (see below, the official PyTorch document is here.) Current implementation simplifies the calculation and avoids division by zero caused by unstable seeds in the early training stage. https://github.com/chenqi1126/SIPE/blob/e0c6a3bc578a6bade2cf85fe34a1793dce71170d/network/resnet50_SIPE.py#L62
Ok, got it, thanks!
https://github.com/chenqi1126/SIPE/blob/e0c6a3bc578a6bade2cf85fe34a1793dce71170d/network/resnet50_SIPE.py#L58
The denominator of the two is not the same. The code is the area of the entire feature map (h*w), while the formula is the area of the foreground.