Issues of normalized - Githubissues

AIRedWood commented 5 years ago

Hello, I tested the changes of loss function and loss_me function with the number of iterations when normalized is True or False. Then I found that when normalized is False, it converges significantly faster than when True. It's really amazing, so I want to explore the reasons in detail. Is the reference paper you mentioned in issue 10 High Accuracy Optical Flow Estimation Based on a Theory for Warping?

AIRedWood commented 5 years ago

or TwoFrame Motion Estimation Based on Polynomial Expansion?

LoSealL commented 5 years ago

@AIRedWood Original paper of VESPCN; and CVPR17 Detail-revealing Deep Video Super-resolution, whose testing model is open sourced.

Actually, most modern flow estimation network (such as Flownet 1/2, PWC-Net, FlowLite...) never uses "normalized" flow, they always employ absolute value.

LoSealL commented 5 years ago

Perhaps it's unwise to constraint the movement to just 1 pixel, because it's quite unnatural, combining flownet2 or PWC-net may be a possible way...

AIRedWood commented 5 years ago

I read the paper FRVER, and then I found that the Fnet inside is just simple convolution - convolution -.... - convolution - tanh, but VESPCN uses Coarse - to - fine warping techniques, and I don't know which one is better.

AIRedWood commented 5 years ago

I'll look at the papers Flownet and PWC-net later.

AIRedWood commented 5 years ago

And I'm thinking about a problem. VESPCN uses W [-1, 1] to indicate that the maximum displacement is W (picture size), while the modification to [-1, 1] actually makes W = 1, which means that the maximum displacement does not exceed 1, so it should be possible to set a parameter 1 < x < W, x [-1, 1] to indicate that the maximum displacement does not exceed X. X can be 2,3,....

AIRedWood commented 5 years ago

@LoSealL

LoSealL commented 5 years ago

@AIRedWood Let's discuss on two cats:

traditional deep-learning optical flow network
optical flow based video super-resolution sub-network for optical flow

In many deep-learning optical flow networks, there's a major benchmark sintel. Currently, top methods output optical flow in a pyramid way and their last level is the full or half size of the optical flow. Detailly, the final output layer is usually a conv2d without activation, and the objective function is mainly EPE (end-point error, similar to L2). While their outputs are representing absolute movement pixels on 2-D directions.

Now let's see VSR's optical flow sub-network. In the early work such as VESPCN, a small net is used, the last layer is conv2d with tanh activation. tanh maps outputs to (-1, 1). The same architecture is used in SPMC, ICCV17 and FRVSR, CVPR18. What's different, only in VESPCN that the author explicitly interpreted the meaning of tanh (Section 2.3, last sentence in 3rd par):

Output activations use tanh to represent pixel displacement in normalised space, such that a displacement of ±1 means maximum displace- ment from the center to the border of the image.

While other papers didn't mention that. In the released model of SPMC, I found they warp the flow directly, without denormalize it, so they treat the tanh output as a constraint to limit movement no more than 1 pixel in LR space. It's worth noting that 1 pixel limits in LR space means 4-pixel limits in HR space (for x4 up-scale), so I think it's not that bad 'cause 4 pixels in HR space is an acceptable slow movement, that's why you can find there's no very fast movement in VSR benchmark.

Back to your idea, it's a good strategy to use coarse-to-fine regression, but be aware that spatial transform network in VESPCN is only 10-layer depth, for comparison, FRVSR has 14 CNN layers and PWC has more than 50 CNN layers. My point, extend limits to pixel displacement is OK, while the capacity of flow network should be increased.

LoSealL / VideoSuperResolution

Issues of normalized #23