Closed Xiaoming-Zhao closed 5 months ago
Hi Xiaoming,
Much thanks for the clarification! You are right about the scaling: During the designing of our PSNR-H metric we were tempted to rescale it to be similar to PSNR-L, which turns out to be inconsistent with the mathematical definition of PSNR (By introducing a constant bias). The grey image setting on the other hand is introduced for standardizing the evaluation of non-converged results - many methods failed to converge in some test cases and produced very random outputs, leading to high variance of the score.
Hence, given all the considerations and tricks, we require all competitors to run the same official evaluation for our benchmark to ensure fairness.
Best, Zhengfei
Hi, thanks a lot for the great dataset.
As we chatted offline, it may be better to add some clarifications in the document about the PSNR computation, especially this line: https://github.com/StanfordORB/Stanford-ORB/blob/962ea6d2cced6c9ea076fea4dc33464589036552/orb/utils/metrics.py#L86
Currently, the PSNR metric may not be the raw PSNR value as it could come from a metric computed on a pure grey image, i.e., the
lb
value.In the current implementation, it is used to avoid some negative values from
mse_to_psnr
function: https://github.com/StanfordORB/Stanford-ORB/blob/962ea6d2cced6c9ea076fea4dc33464589036552/orb/utils/metrics.py#L6-L8However, I think the appearance of negative values comes from the potentially-inappropriate value range used in
mse_to_psnr
:mse_to_psnr
, there will not be any negative values either. I mentioned value range of 4 because of this line: https://github.com/StanfordORB/Stanford-ORB/blob/962ea6d2cced6c9ea076fea4dc33464589036552/orb/utils/test.py#L67It may be beneficial to clarify this.
Overall, thanks again for this amazing datast.