Questionable depth map on VOID ground truth + inference

DornAres commented 2 years ago

Hey there! Thank you for the work! I tried it out on my own sparse depth map + rgb image, and it didn't perform too well at all. I also visualized some ground truth data from the VOID dataset, and found that some pointclouds look similarly bad. The copyroom folder looks fine, but the first image from _birthplace_ofinternet already looks bad. I can understand my own dataset could be problematic concerning sparse depth map resolution, but after looking at some ground truth data, I'm wondering if the problem lies elsewhere.

Any idea on why my custom dataset would look like this? The "bad" ground truth from VOID still looks better than my results . Here a link to the visualization of a VOID and a custom pointcloud.

I ran kbnet on both a Python 3.7 venv with the given dependency versions and a 3.9 venv with newer library versions.

I visualized everything using Open3D:

        image_opencv = cv2.imread(image_file)
        image = o3d.io.read_image(image_file)
        depth_image = o3d.io.read_image(depth_file)
        K = np.loadtxt(intrinsics_file)
        intrinsic = o3d.camera.PinholeCameraIntrinsic(image_opencv.shape[0], image_opencv.shape[1], K[0][0],K[1][1], K[0][2], K[1][2])
        rgbd_image = o3d.geometry.RGBDImage.create_from_color_and_depth(image, depth_image, convert_rgb_to_intensity=False)
        pcd = o3d.geometry.PointCloud.create_from_rgbd_image(rgbd_image, intrinsic)
        o3d.visualization.draw_geometries([pcd])

alexklwong commented 2 years ago

Hi, thanks for the interest! Just to make sure: Is the visualization you are showing the ground truth? Are you using the depth maps that were unzipped after downloaded from the Google Drive?

Also, the way that depth maps are loaded and stored is using data_utils. Have you tried that? https://github.com/alexklwong/calibrated-backprojection-network/blob/master/src/data_utils.py#L123

alexklwong commented 2 years ago

Also, one thing to note is that the depth maps have "holes" of missing regions due to occlusions, reflectivity, etc., which when backprojected will create "spikes" towards the camera. So it helps to interpolate the missing regions using something like: https://github.com/alexklwong/unsupervised-depth-completion-visual-inertial-odometry/blob/master/src/data_utils.py#L178

before you visualize them.

DornAres commented 2 years ago

Ah that seems to explain the spikes in the ground truth, thanks!

The other matter sadly still persists: shite

Left: Ground truth depth map (admittedly, without interpolation), Right: Output depth map after inference.

Dataset: VOID/copyroom1 (1st image), and yes, unzipped after downloading from Google Drive Weights: kbnet-void1500.pth

Obviously tried it out on different images as well. Not sure what I could've done wrong, since I simply executed the _run_kbnetvoid1500.sh. And I now loaded visualization using your _datautils, not that this would make a difference in the above case.

edit: I just now remember some changes I had made, but that still shouldn't impact the depth completion, as far as I know. But maybe you know more:

I changed something in the _load_imagetriplet method to only return the image as the 2nd variable. Because only 1 image was being loaded, I was getting the array splitting error.
I also originally used @rakshith95's fork for single image inference, but re-ran everything with the original repository just to be sure, and the results are still the same. The fork doesn't differ much in the execution process, just in preprocessing.
I didn't run the setup script since I didn't want the entire VOID dataset extracting on my home partition. So after trying out the single image inference from the fork (which worked), I used this repo, created the _testing/void/void_test_image1500.txt myself and put in 2-3 image paths, same for the ground truth, intrinsics and sparse etc. The results were the same as the ones from single image inference.

2022-07-13_21-00

edit2: I am now convinced it's a library version / dependency issue, perhaps similar to the one mentioned in another thread.

alexklwong commented 2 years ago

I think this may be a pytorch version issue, for instance: https://github.com/alexklwong/calibrated-backprojection-network/issues/7#issuecomment-998527719 https://github.com/alexklwong/calibrated-backprojection-network/issues/8#issuecomment-1095340419

What version of Ubuntu, CUDA, PyTorch version are you using? Also which GPU are you using?

DornAres commented 2 years ago

I also think that's it, although I've tried out some options already.

Ubuntu 20.04, RTX 3080, CUDA Driver version 510.47.03, CUDA Toolkit 11.3

I tried out the following venvs: Python 3.7, torch==1.8.2+cu111 (the 2nd installation way from the README) Python 3.9, torch==1.8.2+cu111 Python 3.9, torch==1.11.0+cu113

I'm considering downgrading to CUDA 11.1 but not sure if worth the hassle considering it should be compatible with the cu111 torch binaries and the fact is still gets the same results using torch 1.11+cu113

alexklwong commented 2 years ago

Hi, we did a number of experiments with different CUDA and pytorch versions.

What we found is that for the RTX family. The ones that seem to work well are:

Python 3.8, torch==1.8.0+cu111 Python 3.8, torch==1.9.1+cu111

In fact, it looks like we under reported (or that they fixed some computation) because both runs improved over our reported result.

Retraining with Python 3.8, torch==1.8.0+cu111

MAE      RMSE      iMAE     iRMSE
38.023    93.240    20.502    49.942

Retraining with Python 3.8, torch==1.9.1+cu111

MAE      RMSE      iMAE     iRMSE
38.292    95.003    20.657    49.614

1.8.1 will not work because they introduced a bug in indexing slices, causing errors to be thrown. Note that 1.8.0 and 1.9.1 are also compatible with CUDA 11.3

DornAres commented 2 years ago

Thank you! It does seem to perform consistently better now.

alexklwong / calibrated-backprojection-network

Questionable depth map on VOID ground truth + inference #20