fangchangma / sparse-to-dense

ICRA 2018 "Sparse-to-Dense: Depth Prediction from Sparse Depth Samples and a Single Image" (Torch Implementation)
Other
417 stars 95 forks source link

extremely weird output from testing #12

Closed yechenw closed 5 years ago

yechenw commented 6 years ago

Hi,

I tried to test the pretained model http://datasets.lids.mit.edu/sparse-to-dense/results/kitti.input=rgbd.nsample=500.rep=linear.encoder=conv.decoder=upproj.nDecoder=4.criterion=l1.lr=0.01.bs=16.pretrained=true/ using the command th main.lua -dataset kitti -inputType rgbd -nSample 500 -pretrain true -testOnly true

and here's the result:

screen shot 2018-08-27 at 4 27 20 pm

which is very different from the result indicated in the paper

I've redownloaded/rerun the code several times but it still shows the same result. It seems impossible but do you think there might be something wrong with the code? Thanks

fangchangma commented 6 years ago

Thanks for reporting the issue. It is likely to be a problem with the KITTI trained models, which might have been trained with a different version of the code. I'll look into that and make necessary changes by the end of the week.

However, you can try to comment out this line and see if the test accuracy gets better.

fangchangma commented 6 years ago

I just tested the trained model and got DELTA1=0.94, so both the code and data were actually working properly.

Did you install the latest Torch? Is your torch-hdf5 installed from https://github.com/davek44/torch-hdf5.git, as instructed in the README.md file?

yechenw commented 6 years ago

Thanks for the help. I tried to comment out the normalization part as you suggest, whereas the result is still the same (exact same number). Also, I did install torch-hdf5 from the link you provide, and for torch I installed it from its website very recently. Is it possible that the version of other things (like cuda, mine's 9.2) is causing the problem?

yechenw commented 6 years ago

Also, I tried to train a model using the code with 200 depth samples, and here's the result (testlog.txt):

screen shot 2018-08-28 at 11 00 26 am
fangchangma commented 6 years ago

If you get the exact same numbers both with and without normalization, then it is very likely that your system somehow loads the data incorrectly (e.g., read the images in as all zeros), or that the downloaded data is corrupted.

A sanity check would be to print the loaded images out in Torch. It will also be helpful to load the downloaded hdf5 files using other tools (e.g., Matlab or Python) and verify if they produce meaningful values. Alternatively, you can also try to train a model with PyTorch version of this code.

yechenw commented 6 years ago

Thanks, I will try redownloading the dataset. For the PyTorch version, does it support training on kitti dataset? Cuz it says “Currently this repo only supports training on the NYU dataset” on the page

fangchangma commented 6 years ago

The PyTorch code has already added support for KITTI.

yechenw commented 6 years ago

Hi, I tried to print out the depth data, and it looked like this:

screen shot 2018-08-30 at 10 17 28 am

From the code, there's a method called setZeroToNan(). Could you please explain why it is necessary to do so? Thanks

fangchangma commented 6 years ago

From the look of it, the data might have been loaded properly. We masked non-available pixels in the depth images as NaN, as it simplified data pre-processing for other depth representations (e.g., inverse depth, log depth). Now that I think about it, this can be error-prone since different libraries (numpy vs. cuda, and different versions of cuda) might handle NaN values in different manners. That could potentially be the culprit.

You can try commenting out setZeroToNan() and see if you get more meaningful results. If so, please create a pull request and I'll merge it.

yechenw commented 6 years ago

Sorry for the late reply. I commented out setZeroToNan() and it still doesn't work. I also try training with the 'rgb' option and it leads to the similar result as above (basically the loss won't update). Guess the code is highly dependent on the version of the cuda/other packages. I will try the pytorch version of the code and see what happens. Thanks for the help!