TRI-ML / packnet-sfm

TRI-ML Monocular Depth Estimation Repository
https://tri-ml.github.io/packnet-sfm/
MIT License
1.24k stars 243 forks source link

Why are original (and not augmented) images used to calculate the loss? #103

Closed zshn25 closed 3 years ago

zshn25 commented 3 years ago

https://github.com/TRI-ML/packnet-sfm/blob/2698f1fb27785275ef847f3dbbd550cf8fff1799/packnet_sfm/models/SelfSupModel.py#L89-L92

VitorGuizilini-TRI commented 3 years ago

We feed the augmented image to the depth network, for the photometric calculation we believe there is no need, since the calculation is appearance-based and augmentation could create artifacts that compromise matching in different viewpoints.

zshn25 commented 3 years ago

But isn't SSIM invariant to the jitter augmentations?

VitorGuizilini-TRI commented 3 years ago

Sure, and in fact I don't think we observe any significant changes if we go one way or another (raw or augmented images). This is mostly a design choice, since we have access to the raw untouched image it is cleaner if we use them directly instead of the augmented one.

zshn25 commented 3 years ago

Yes, makes sense to use it while you already have it. Thanks!