Why is self-supervised worse than supervised?

fangchangma / self-supervised-depth-completion

ICRA 2019 "Self-supervised Sparse-to-Dense: Self-supervised Depth Completion from LiDAR and Monocular Camera"

MIT License

623 stars 135 forks source link

Why is self-supervised worse than supervised? #26

Closed kwea123 closed 5 years ago

kwea123 commented 5 years ago

On page 9 of the paper figure 6b, on the rightmost point, the self-supervised method receives semi-dense lidar ground truth, which is no longer "sparse depth loss"; I don't understand why it performs worse than the supervised method which has the same ground truth supervision. The self-supervised one has additional losses such as photometric loss, etc, so it should at least perform as well as the supervised one in my opinion.

How do you explain this?

fangchangma commented 5 years ago

The supervised method uses the semi-dense ground truth (provided by the KITTI benchmark by aggregating 11 consecutive frames, roughly 30% pixels). In comparison, the self-supervised method uses only "sparse depth loss", i.e., the input lidar scans (~4% pixels).

kwea123 commented 5 years ago

Oh I see, so it's 11 frames v.s. 1 frame basically. Thank you for the clarification.