NVlabs / extreme-view-synth

Other
76 stars 5 forks source link

What is the extremity? #7

Closed kwea123 closed 4 years ago

kwea123 commented 4 years ago

The very first figure of your paper shows that your method is able to synthesize from 30x of the original baseline. I wonder how you evaluated that, is it just by human eye judgment? You mentioned Stereo Magnification that can augment to 4.5x, but I also can't find any numeric results in their paper that supports the claim, so is everyone only doing qualitative judgment so far?

Another question is about the camera displacement. To my knowledge, nearly all works try to synthesize views with only horizontal displacements with barely forward/backward movements. In your Figure 10, it seems that your method can also synthesize forwardly displaced camera, I wonder in this case, what is the limit of your method, is that also up to 30x?

Finally do you know any work that explicitly evaluates the relation between the quality and the displacement amount? For example when displaced by 10x, the psnr score is aaa, when displaced by 20x the psnr score is bbb?

orazio-gallo-nvidia commented 4 years ago

Hi, this is probably a discussion for an email thread on the paper, not an issue related to the code.

With that said, if you're asking how we evaluate the results, while we can only evaluate numerically the results for which we have ground truth (GT) images, it is common practice in the community to rely on subjective evaluations when GT images are not available. For numerical evaluations it is common to use synthetic data, data with GT (e.g., a real video in which the camera moves and the task is to predict one of the future frames), or user studies. We discuss our evaluation in Section 7 of the paper, where we also provide numerical evaluations--please take a look.

Our method (like others, actually) allows for any camera movement. But you're right, the type of motion does affect how much the camera can move before artifacts become too strong (and so does the scene, the number of input cameras, etc.). I believe that for the specific example you refer to we didn't try such large displacement.

As for your final question: I don't know but that's a good idea--we should have done something like that.