NVlabs / PWC-Net

PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume, CVPR 2018 (Oral)
Other
1.63k stars 357 forks source link

Not Using the First Feature (c11 and c21) for the flow estimation; Upsampling not present in the shallowest flow estimation #137

Open Avishka-Perera opened 7 months ago

Avishka-Perera commented 7 months ago

Dear @deqings, @mingyuliutw, @jrenzhile, @KinglittleQ,

This issue is based on the file PyTorch/models/PWCNet.py which was initially committed by @mingyuliutw . We have two problems.

  1. c11 and c21 are not used for flow estimation
  2. Upsample operation is not done in the pyramid level 1

1. c11 and c21 are not used for flow estimation

We noted that in the PyTorch implementation, the first level of features are not used for the flow estimation. Specifically, in the line 182 and 183, the features corresponding to the first layer (c11 and c21) and retrieved through convolution operations. But these are only used as input to the next convolution stack, but not for the flow estimation.

In contrast, all the other features ( c12, c22, c13, c23, ...) are used in successive flow estimations to warp and find the corresponding correlation volume.

As a result, the shape of the predicted flow will be reduced by a factor of 2 than expected. ----(A)

We suspect that this can be handled by adding another flow estimation block starting by line 264 that utilizes c11 and c21

2. Upsampling operation is not done in the pyramid level 1

The transpose convolution is performed to upsample the predicted flow and decoder features by a factor of 2 at all the pyramid levels (lines 206, 207; lines 220, 221; lines 234, 235; lines 250, 251) except for the last layer.

As a result, the shape of the predicted flow will be reduced by a factor of 2 than expected. ----(B)

We suspect that this can be handled by adding another transpose convolution starting by line 264, and then send the result through the context network.

As a combined result of (A) and (B), the final flow will be 4 times smaller than the input. We would like to know, if this is a mistake in the code or are we supposed to simply interpolate and scale the flow by a factor of 4 to compare against the ground truth.

Any help is greatly appreciated 😃

Thank you, Kind regards.

Avishka-Perera commented 7 months ago

Small update,

I figured out the reason for question 2. So I'll withdraw that.

Instead, won't refining the flow also in the image level give better results? Is there a specific reason why this haven't been implemented in PWCNet? I'm about to test this out. Before that, I was thinking if you would have any thoughts regarding it.

Kind regards, Avishka Perera