dvlab-research / Video-P2P

Video-P2P: Video Editing with Cross-attention Control
https://video-p2p.github.io/
355 stars 24 forks source link

How to calculate quantitative metrics #5

Closed shliu0 closed 1 year ago

shliu0 commented 1 year ago

Hi, I am really appreciate your excellent work. Could you please tell more details about how you calculate theses metrics? the paper demonstrate 4 different quantitative metrics(3 proposed before, 1 proposed in your paper) and said can find details in appendix, however, I can't find the appendix. I am wondering if you can tell more details about how to calculate them. e.g. what is the mask in M.PSNR, how to calculate PSNR(averaged PSNR of R,G,B channel or take RGB as a whole then divided by 3?) the version you use for LPIPS, averaged frame LPIPS to represent video LPIPS?, the definition of OSV, etc.

Thanks

ShaoTengLiu commented 1 year ago

Please refer:

appendix_1 appendix_2
shliu0 commented 1 year ago

Thanks a lot! Two more questions to make sure I understand correctly: 1、PSNR is calculated between changed object regions instead of background? 2、Var() for shortly in eq(15), is of size (n, c)? then how to calculate this variance?

JulianJuaner commented 1 year ago

Thanks a lot! Two more questions to make sure I understand correctly: 1、PSNR is calculated between changed object regions instead of background? 2、Var() for shortly in eq(15), is of size (n, c)? then how to calculate this variance?

Hi,

  1. In eq(14), M.PSNR is calculated between unchanged regions (background) as we need to ensure the unrelated regions are unchanged after the editing.
  2. You are right, in eq(15), * is of size(n, c). The variance is calculated among the frame dimension (n), and we sum the variance in each channel as the OSV result.
shliu0 commented 1 year ago

Thanks a lot! Two more questions to make sure I understand correctly: 1、PSNR is calculated between changed object regions instead of background? 2、Var() for shortly in eq(15), is of size (n, c)? then how to calculate this variance?

Hi,

  1. In eq(14), M.PSNR is calculated between unchanged regions (background) as we need to ensure the unrelated regions are unchanged after the editing.
  2. You are right, in eq(15), * is of size(n, c). The variance is calculated among the frame dimension (n), and we sum the variance in each channel as the OSV result.

got it. Thanks for your explanation!