gengshan-y / expansion

Upgrading Optical Flow to 3D Scene Flow through Optical Expansion, CVPR 2020 (Oral).
https://gengshan-y.github.io/expansion/
MIT License
172 stars 27 forks source link

How should i interpret negative Z values? #12

Open tonytu16 opened 3 years ago

tonytu16 commented 3 years ago

Hello, thank you for the great work! I have a question regarding using the following equation to calculate the Z values:

Screen Shot 2021-01-05 at 11 35 41 AM

I used your code to calculate the tau for the following image:

frame000400

and visualized the Z output on a rainbow colormap:

absolute400

Screen Shot 2021-01-02 at 5 16 27 PM

I have also ran the time-to-collision calculation on the same frame:

Screen Shot 2021-01-01 at 7 07 43 PM

ttc400

Screen Shot 2021-01-02 at 3 32 01 PM

The black color represents negative values. From my understanding, both the absolute depth and time to collision should be non-negative. However, a significant portion of the frame has negative values. The pfm files output tau values that range from roughly -0.02 to 0.01 which, when taken the exponential, range roughly from 1.01 to 0.98. When I subtract it from the one matrix, this create the negative values. Am I misunderstanding something?

Thank you

gengshan-y commented 3 years ago

Hi, the negative time-to-collision in your results is expected for points appear to be moving away from the camera.

To get reasonable TTC, one assumption in this work is that points are moving towards the camera, such that (Z-Z')>0. Another assumption is that points are moving in constant velocity, such that velocity = (Z-Z')/T_sampling and TTC=Z/velocity.

With that being said, the absolution value of ttc can be interpreted as "time to doubling the distance" in the negative ttc case when points appear to be moving away.

tonytu16 commented 3 years ago

So to obtain the time & absolute depth I should take the absolute value of the negative values and divide by 2? Does this apply to both the TTC and Z?

gengshan-y commented 3 years ago

Negative TTC implies the points will never collide with the imaging plane so the time would be infinity.

For depth I think taking the absolute value of TTC without dividing by 2 will work.

Liyiwei12138 commented 4 months ago

Hello, I think that in order to use this formula to accurately calculate the depth, you need to ensure that the target is static, and how to use this depth estimation formula in a dynamic situation @gengshan-y

gengshan-y commented 4 months ago

Hi, you are right about the rigidity assumption. For dynamic scenes, one thought is to break it into locally rigid pieces, and apply the same algorithm for each piece. But note that there is a scale ambiguity between pieces. You may want to look at superpixel soup and rigidmask for a deeper analysis.

Liyiwei12138 commented 4 months ago

Thank you for your response. I would like to confirm the rationale behind the formula [ Z=(1/1-tau) tcz] in the paper for predicting dynamic objects. In my opinion, when both the camera and the object are in motion, it should be [ Z=(1/1-tau) (tcz-tmo)], where tcz represents the camera motion and tmo represents the object motion. This formulation would be more reasonable, as neglecting tmo could lead to significant errors. Additionally, I believe that the errors you mentioned in obtaining depth through triangulation on the object in the paper are also due to the dynamic nature of the object, making accurate triangulation difficult. @gengshan-y

gengshan-y commented 4 months ago

Hi, after breaking the dynamic scene into rigid pieces, t_cz would become the relative motion between the camera and the rigid piece. Such relative motion can be computed as t_cz - tmo.

If I understand your second point correctly, our analysis in Sec 4.5 is making a different point than dynamic objects. When the scene or piece is rigid, triangulating correspondence near the epipole leads to large depth error, becase of small "baseline".

Please feel free to follow up if you have additional questions.