Update model evaluation scheme

Currently, the model's performance is only evaluated by a few metrics (RMSE, MAE, and Delta1), and qualitatively analyzing a few images. This has worked up until now to produce pretty good results, but to further refine the model a better evaluation procedure needs to be created.

[x] Log all relevant metrics to Comet (Log RMSE, Sq. rel, etc.)
[x] Add error map logging to Comet
[x] Create standard test folder that has a variety of challenging simulation and real-world scenes
[x] Create a test_visual/ subsample that has a handful of images that will be visually inspected for each model (50 or so)
[x] Create evaluate_quality script to save standardized image comparisons of each model + GT of test_visual/

appliedinnovation / fast-depth

Update model evaluation scheme #17