ClementPinard / SfmLearner-Pytorch

Pytorch version of SfmLearner from Tinghui Zhou et al.
MIT License
1.01k stars 224 forks source link

Weights for Explainability in PosExpNet #109

Closed Etienne-Meunier closed 2 years ago

Etienne-Meunier commented 3 years ago

Hi ! Thank you very much for your work, would you have by any chance computed weights for the explainability part of the network ( the ConvTranspose2d weights ) ? I think they are missing in the .pth file available :

image

Thanks !

ClementPinard commented 3 years ago

Hi, indeed the wieght have not been computed. Turns out that even though the explainability is a great idea, i gets beaten simply by more data augmentation and more smooth loss. In other words, The checkpoint network is trained without explainability, and you probably don't need it for KITTI.

On another dataset it might be useful though ! but then the pretrained network on kitti might not help you

alexlopezcifuentes commented 3 years ago

Hi!

I have a question regarding also de Explainability Mask and maybe, either @Etienne-Meunier or you @ClementPinard can answer me as its highly related to this issue.

What are the expected values for that mask when using it for training? Are we expecting values near 1s towards moving objects and areas where the estimated transformation is not correct? Or, on the contrary, we are expecting 0s...

Maybe I'm wrong in my reasoning, but, in my opinion, one would expect that if we want to remove moving objects from the Photometric Loss as the masking done here: https://github.com/ClementPinard/SfmLearner-Pytorch/blob/4e6b7e8b545f6e80c2714ba41231e5fafb1e803c/loss_functions.py#L36

The mask should contain 0s in that moving areas so that the Loss is moved towards 0 and those pixels do not have an impact on the calculated gradients thus, not affecting the training. However, when I train it, it seems that is the other way around, highlighting moving objects with 1s... Is there something that I am missing or not properly understanding?

ClementPinard commented 3 years ago

Hello, you are right, mask should be 0 when photometric loss is "inexplicable". It's strange that it goes the other way around. Can you share mask visualization ?

alexlopezcifuentes commented 3 years ago

Thanks, Clement, as always, for your super fast answer. With that in mind, I could reaffirm that something wrong was happening in my masks. It seems that I had and issue regarding the colormap I used to save the explain mask, silly issue.

For static sequences (which I did not use for training) things go quite well and the mask properly highlights moving objects (blue colors represent lower values): Static Correct

For the dynamic sequences, where there is actual camera movement, the map is highlighting areas that are supposed to be rigid: Dynamic Incorrect

I think that's related to the network not properly learning a good transformation (as can be observed in the Photometric Loss). I think it might be mainly because I'm only using Kitti for training which might not be enough data. But that's another issue :D.

ClementPinard commented 3 years ago

First thought : Yes, explicability is not essential for KITTI, even the original author dismissed it in its github because a better data augmentation did a better job

On second though, the explainability here seems to be explained with large areas of the same colors (either all black or all white). What we can conclude is that explainability is high when pixel difference betwen target and ref unwarped is low. I don't think it highlights moving objects contrary to what's it's supposed to do, but It is unfortunately a bit trivial.

I would try to see if this behaviour still occur with a higher weight for mask loss, that way, it has incentive to be high for other photogrammetric losses than the obvious ones

alexlopezcifuentes commented 3 years ago

Thanks Clement, I will try your suggestion using a higher weight for the mask.

I know that for this particular method using good data augmentation does a better job in terms of depth and pose estimation, however, I'm trying to train this particularly to obtain motion masks for another task, that's why it is important for me.

If I still see that I can not obtain good explainable mask results I'll move to a more recent approach that outputs motion masks.

ClementPinard commented 3 years ago

Another idea that comes to mind, is that you can saturate the mask activation when visualizing it. It's not because trivial areas are twice as explainable that the rest falls in the category of "not explainable". Trying to spot areas where explainability is very close to 0 might give you more insights.

Anyway, good luck with your project !