Problem when train from scratch

CVLAB-Unibo / Real-time-self-adaptive-deep-stereo

Code for "Real-time self-adaptive deep stereo" - CVPR 2019 (ORAL)

Apache License 2.0

419 stars 73 forks source link

Problem when train from scratch #63

Open Tiam271 opened 3 years ago

Tiam271 commented 3 years ago

Hello great work @AlessioTonioni and team! I am trying to use the work for training on custom data which has spare disparity maps (16 bit) as shown below: 2018-07-09-16-11-56_2018-07-09-16-13-31-760

I did not modify any code. However, as the training progresses, the disparity predicted by the network has always been like this: (Except for the first frame) issue

Do you know why? Can you help me?

AlessioTonioni commented 3 years ago

What is the range of the disparities? Are you starting from random initialization or from a set of pre trained weights?

Tiam271 commented 3 years ago

The range of the disparities is between 0 and 192. It is starting from random initialization. Do you mean that if I start training from random initialization, I should use dense ground-truth?

AlessioTonioni commented 3 years ago

Generally speaking yes it is more stable with full disparities, or at least not very sparse ones. Have you tried starting from the Flying Things 3D weights?

some additional questions:

In the 16 bit disparity maps the pixel intensities are mapped to the disparity values multiplied by 256 right? (i.e. as done in KITTI)
Which value are you using in the GT for pixels without a disparity?

Tiam271 commented 3 years ago

The 16-bit disparity map is exactly the same as in Kitti.
The value in the GT for pixels without a disparity is setting as zero.

When I start from the pre-trained weights(Flying Things 3D), this problem does not occur. I understand, this is because the supervision information is too sparse. Thank you for your answer!

AlessioTonioni commented 3 years ago

I see, so the encoding of the GT seems ok. Probably as you said you are experiencing collapsing because the supervision is too sparse. I would bet that by playing with the hyperparameters you might be able to train the network directly on the sparse data, but in general I think it's beneficial to start from the F3D weights. Within the same codebase you can also experiment with Dispnet, even if it is slower it's a way more stable model and you might be able to sucesfully train it on sparse data from scratch (not sure tough)