appliedinnovation / fast-depth

ICRA 2019 "FastDepth: Fast Monocular Depth Estimation on Embedded Systems"
MIT License
0 stars 0 forks source link

Training stagnates and cannot predict fine-scale features #2

Open alexbarnett12 opened 3 years ago

alexbarnett12 commented 3 years ago

After over a month of performing training experiments, moderately good results have been achieved. Below are some examples of good predictions: decent_results_train_image_0 decent_results_val_image_0 decent_results_val_image_1

However, across all images, the model is unable to predict fine-scale features, particularly in the depth range of 0 to 1.0 meters. This is bad since fine-scale features are actually the most important data to predict for obstacle navigation. For example, in the photo below, the model is able to predict nearly a single value across all pixels and still achieve a low RMSE, since the error between each pixel is small. decent_results_train_image_1

As a baseline, I trained on the NYU Depth dataset to try and replicate the results in the FastDepth paper. Quantitatively, the results were very good, with a test RMSE of .75 and delta1 of .7. However, the model still couldn't predict fine-scale objects: nyu_train_image_0

My intuition from these results is that more simulation data by itself won't solve the problem. I see a few options going forward:

My first steps going forward will be to increase our training dataset and try some different loss functions. If that leads nowhere, then I will revisit some of the paths above.

finger563 commented 3 years ago

Thanks for the detailed writeup @alexbarnett12 👍

finger563 commented 3 years ago

Based on our discussion today:

These two options (which should be roughly equivalent (though perhaps the first option is better numerically) should help us considerably with close depth values.

However, it may run into issues since the loss at depths larger than 1 meter will be less than 1. We may want to not use inverse strictly as 1 / x but instead have some scalar value so that it is more like 10 / x where 0-10 is the range of depth we are primarily interested in.

alexbarnett12 commented 3 years ago

Another idea would be to add a component to the loss function that is based off the delta1 metric, which is the percentage of pixels with less than 20% error. It's a good measure of if the error is spread across the entire image (like it currently is), or if there are just some outlier pixels (not ideal, but better than alternative). I don't know if this would work as a loss function by itself, but it could potentially be calculated in conjunction with L1.