Closed kwea123 closed 5 years ago
The supervised method uses the semi-dense ground truth (provided by the KITTI benchmark by aggregating 11 consecutive frames, roughly 30% pixels). In comparison, the self-supervised method uses only "sparse depth loss", i.e., the input lidar scans (~4% pixels).
Oh I see, so it's 11 frames v.s. 1 frame basically. Thank you for the clarification.
On page 9 of the paper figure 6b, on the rightmost point, the self-supervised method receives semi-dense lidar ground truth, which is no longer "sparse depth loss"; I don't understand why it performs worse than the supervised method which has the same ground truth supervision. The self-supervised one has additional losses such as photometric loss, etc, so it should at least perform as well as the supervised one in my opinion.
How do you explain this?