Closed maskedmeerkat closed 4 years ago
Thank you very much, we are glad you are finding our codebase useful. We are gradually porting internal functionality to the public repository, other dataset loaders are on the list, but I am not sure when they will be added. About your questions:
I hope this helps!
Thanks for the fast reply. Okay, those are some good pointers that clarify all of my questions and give me some things to look into.
I will try it out.
Hi Vitor,
I looked into your suggestions. I think the currently published version cannot handle sparse depth maps. You can see this by first of all looking at the results where the depth seems okay in some horizontal lines in the image but goes to max range elsewhere (why max range and not min range is also strange to...)
I then looked into the code and found, that the "sparse" in "sparse-L1" is ignored in this function
Could you check, if the sparse-L1 is really supported in the published version and if not, could you give me pointers how to adapt the code?
Moreover, do you have experimented with average spatial distance between context and target images? Cause, I am wondering if I can take the NuScenes "samples" (their keyframes) as context, which are far apart or whether I have to use intermediate sweeps. The sweeps don't come with pose information but are spatially much closer.
Thank you for your efforts.
The "sparse" part is taken into consideration here:
by masking out all the pixels in both predicted and ground-truth depth maps that have ground-truth depth > 0.0. One thing I should mention is that the supervised loss is actually applied on the inverse depth maps, not on the depth maps. One of our next updates will address that and include the option of doing one or the other, but in the meantime you might want to "invert the inverse depth maps" back and see if that helps.
We have experimented with KITTI, trying different contexts, and definitely there is a limit as to how far context images can be before training breaks. They also cannot be too close, otherwise there is not enough motion, so for each dataset there is a "sweet spot" where self-supervised training works. Also, you don't need pose to train, so you can try with intermediate sweeps both adjacent and with strides, maybe some skipping will get you to the right baseline.
First of all, thanks for your reply.
So you suggest, I move the depth2inv(...)
trafo from here
https://github.com/TRI-ML/packnet-sfm/blob/f824ffceba46ae1c621e1bf22a35634d8b39207c/packnet_sfm/models/SemiSupModel.py#L102-L104
Inside this loss function
https://github.com/TRI-ML/packnet-sfm/blob/f824ffceba46ae1c621e1bf22a35634d8b39207c/packnet_sfm/losses/supervised_loss.py#L123
and change the loss function the sth. like this:
def calculate_loss(self, inv_depths, gt_depths):
"""
Calculate the supervised loss.
Parameters
----------
inv_depths : list of torch.Tensor [B,1,H,W]
List of predicted inverse depth maps
gt_depths : list of torch.Tensor [B,1,H,W]
List of GROUND-TRUTH DEPTH MAPS
Returns
-------
loss : torch.Tensor [1]
Average supervised loss for all scales
"""
# COMPUTE INVERSE DEPTH MAPS HERE
gt_inv_depths = depth2inv(gt_depths)
# If using a sparse loss, mask invalid pixels for all scales
if self.supervised_method.startswith('sparse'):
for i in range(self.n):
# USE GT DEPTH MAPS HERE INSTEAD OF INV DEPTH MAPS
mask = (gt_depths[i] > 0.).detach()
inv_depths[i] = inv_depths[i][mask]
gt_inv_depths[i] = gt_inv_depths[i][mask]
# Return per-scale average loss
return sum([self.loss_func(inv_depths[i], gt_inv_depths[i])
for i in range(self.n)]) / self.n
Did you manage to get it working? I don't think it makes any difference where you do the inversion (if it's at the model level or at the loss level), it should work the same way. When we do the depth2inv
inversion the invalid depth pixels are kept as 0.0
, so you can still mask them out. What you can do is apply inv2depth
on the predicted inverse depth maps and keep the ground-truth depth maps untouched, so the loss is applied directly on depth maps.
Yeah, I also found that the default values are indeed kept untouched and hence changed everything back ^^ Also found a mistake of mine after which the network currently seems to train properly. Can post the results in a couple of days. Currently it needs for all NuScenes samples on one GPU about 7h training per epoch.
Without supervision, it still fails. Hence, I want to try different context distances or maybe even two backward and forward contexts. Maybe that'll stabilize it.
Will keep on posting my findings for other people trying to use NuScenes.
@maskedmeerkat Hi, i'm also thinking of using the NuScenes dataset for training. can you please share your working NuScenes dataloader?
thanks
Currently working on using different contexts. When it's working, I'll post an update here.
Hi @maskedmeerkat,
did you manage to get rid of the horizontal lines after all? I am encountering a similar problem with my own model training on Nuscenes.
Any help is appreciated :)
Hi,
first of all, I want to join the others in congratulating you for your great work and for even providing this well documented code! Thanks.
I am currently trying to write a data loader for the NuScenes dataset. The data split files are formatted as follows
sample_token | backward_context_png | forward_context_png
and the sample is then constructed using this routine
However, my training results look as follows
One problem I encountered, which I am not certain whether it's related or not, is that I had to change the cfg.checkpoint.monitor entry from "loss" to "abs_rel", since the entry "loss" is not initialized if training is False.
My questions are:
Thanks in advance for your time and help.