anuragranj / cc

Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation
https://research.nvidia.com/publication/2018-05_Adversarial-Collaboration-Joint
MIT License
512 stars 63 forks source link

Code does not reproduce results from Figure 4 #28

Open mmbannert opened 1 year ago

mmbannert commented 1 year ago

Dear Anurag,

I ran test_flow.py in an effort to reproduce the results shown in Figure 4 of the paper. However, my qualitative results differed quite a bit from those reported in that paper.

fig4_failure2reproduce

Comparing Figure 4 from the paper with my results, you immediately see that the soft consensus mask has opposite contrast from that shown in the paper. (The paper says that high values of m indicate static scene pixels.) It would not be a problem if the direction of the contrast were just flipped. But even when assuming flipped contrasts, the comparison still puzzles me.

In the left-most one of my examples, it seems like there is some kind of saturation effect (or ceiling/floor effect), which produces a white rim around the image, especially at the bottom and the sides. I presume that this falsely indicates that these peripheral pixels are nonrigid. You can also see this to some extent in the original figure but it is not as strong. Consequently, the model predicts large patches of nonrigid motion: train tracks and trees on the left and the grass on the right. The example in the middle shows a similar problem: There are quite large white areas where no black is seen in the original. This may explain why the model predicts too much nonrigid motion on the right side where there is just grass under shadow. The fourth example from the left also shows too much nonrigid motion on the right side where there is only a building. Maybe the motion segmentation does not work properly? Just guessing...

I ran the code as follows:

 ipython --pdb -- test_flow.py \
     --pretrained-disp ../../cc-models/geometry/dispnet_k.pth.tar \
     --pretrained-pose ../../cc-models/geometry/posenet.pth.tar \
     --pretrained-mask ../../cc-models/geometry/masknet.pth.tar \
     --pretrained-flow ../../cc-models/geometry/back2future.pth.tar \
     --kitti-dir ../../stimuli/kitti2015 \
     --output-dir ../../results/competitive_collaboration/kitti2015_test_flow_demo/thresh_1e-2

Cheers, Michael

mmbannert commented 1 year ago

I have added the model predictions for the samples shown in Figure 7 of the paper. The problems are easier to see here. The parked cars are falsely predicted to be moving. The consensus masks do not look very similar to those of shown in the paper, again, possibly indicating that something might be off about the motion segmentation.

fig7_failure2reproduce

mmbannert commented 1 year ago

The results look a lot better when test_flow.py computes the mask in the same way as test_mask.py does. I used the default mask threshold of .94 to achieve these results.

Figure 4

fig4_failure2reproduce

Figure 7

fig7_failure2reproduce