HCIILAB / Scene-Text-Removal

EnsNet: Ensconce Text in the Wild
124 stars 27 forks source link

About PSNR and l2 #11

Open neouyghur opened 4 years ago

neouyghur commented 4 years ago

Hi, I am checking your results provided in Table1. I find the PSNR is not corresponding to l2. For example, 0.2465 l2 is corresponding to 25.60 PSNR, while 0.0627 l2 is corresponding to 24.83, and scene text easier l2 error is very high. Could you check this or could you offer your model for testing? Thanks.

neouyghur commented 4 years ago

I plotted your PSNR and l2 scores in a figure. It clearly shows your result is not consistent. Could you explain why? @naoto0804 did you get the same PSNR score? Thanks.

EnsNet

zhangshuaitao commented 4 years ago

@neouyghur,Both PSNR and l2 scores are based on the average scores of the all test images.

neouyghur commented 4 years ago

@zhangshuaitao I am comparing my method with yours. I am also following the same protocol, however, my MSE and PSNR curves share the same trend. Besides that, as we know PSNR score is calculated based on the MSE score.

neouyghur commented 4 years ago

@zhangshuaitao is your l2 score is rmse or mse? thanks.

zhangshuaitao commented 4 years ago

@neouyghur, l2 score is mse. We use the compare_mse and compare_ssim and compare_psnr functions in the skimage.measure module.

naoto0804 commented 4 years ago

@zhangshuaitao First of all, thanks again for releasing the code and answering a lot of questions patiently.

However, what you say above seems to be inconsistent with the README.md;To evalution the model performace over a dataset, you can find the evaluation metrics in this website PythonCode.zip.. Which is correct?

naoto0804 commented 4 years ago

I would really appreciate it if you could produce the whole pipeline for evaluation?

It might be hard to follow the exactly same evaluation protocol, since some parameters for each function is unknown. (e.g., compare_ssim has some optional params, how did you set it? What's the range of values in images, 0.0\~1.0 or 0\~255?)

zhangshuaitao commented 4 years ago

@naoto0804, Sorry for not explaining it clearly. we use AGE, pEPs, pCEPS in the PythonCode.zip. We use the compare_mse and compare_ssim and compare_psnr functions in the skimage.measure module. The default parameters for those functions is ok.

naoto0804 commented 4 years ago

@zhangshuaitao Thank you so much for making it much more clear.

To make sure whether I followed your instruction exactly, I've computed all the metrics between all the original input/ground truth images in the test subset of the synthetic dataset. This is because I want to focus on the difference only in the evaluation phase, before reproducing the training phase.

The result is as follows; Do you think it's reasonable? If possible, could you compute it on your dataset and evaluation code? (I suspect there's still bugs in my implmentation, since these values are much better than the baseline method in Table. 1) mse 0.006965 ssim 0.933875 psnr 23.996012 AGE 5.851178 pEPs 0.064378 pCEPs 0.050264

neouyghur commented 4 years ago

@naoto0804 @zhangshuaitao I think your result is reasonable since only a small part of the scene is text. However, I felt they didn't fully train the baselines. With more training, the based line should get better results than they reported.