EPFL-VILAB / omnidata

A Scalable Pipeline for Making Steerable Multi-Task Mid-Level Vision Datasets from 3D Scans [ICCV 2021]
Other
395 stars 49 forks source link

There is an issue of depth_euclidean / depth_zbuffer image data #32

Closed jeeyung closed 1 year ago

jeeyung commented 1 year ago

I have downloaded data by following the process (using omnidata-tools)

I checked the image files in 'depth_euclidean' and 'depth_zbuffer', and they are different from the images of examples you used. It seems "depth" information is not included as it consists of black and white(binary pixel). Could you please check it asap?

alexsax commented 1 year ago

Hi @jeeyung, the depth images are single-channel images with 16-bit depth (documented e.g. here).

Perhaps you are reading them as 8-bit? The included dataloaders should work, and show how to read the images.

jeeyung commented 1 year ago

Hello Alex, Thank you for the reply! I have used the dataloader you mentioned.

point_1_view_3_domain_depth_euclidean The image above is one example of depth euclidean data that I downloaded.

What I expected is similar to the below:

Screen Shot 2022-12-14 at 7 26 12 PM

I expected that the downloaded images should contain gradually changing darkness. Am I missing something?

In addition, when I use taskonomy dataset for depth estimation, the test loss is nearly zero(l1 loss) and the percentage of threshold (one of metrics used in depth estimation defined as max(ground_truth/prediction, prediction/ground_truth) < 1.25) is nearly 100%. That is why I am suspicious about the data.

alexsax commented 1 year ago

The images that you showed from the website clip the max depth at something small (e.g. 1m, 10m) so that the visualizations look nice. Otherwise the missing values (white in the image) will cause low-dynamic range in the visualization. This is a very common practice, so much so that people usually don't mention it (similar to how false-color images are used when visualizing images sensitive to non-visible light ).

Even in the image you sent from the dataloader, I can see there's some gradations of gray in the non-missing pixels. Try the clipping--I suspect you'll see what you're looking for.

For training, you also need to mask out missing pixels if you aren't already.

jeeyung commented 1 year ago

Thanks for the quick response.

What I understand is the max depth of the taskonomy images is 512m and people clip the max depth at 10m for visualization. Is my understanding correct?

alexsax commented 1 year ago

That's basically right--the max_depth is 128m (and the units are in 1/512m). People clip at various amounts., and what I usually do is set the threshold automatically at whatever the maximum nonmissing depth is!

On Wed, Dec 14, 2022 at 5:06 PM jeeyung @.***> wrote:

Thanks for the quick response.

What I understand is the max depth of the taskonomy images is 512m and people clip the max depth at 10m for visualization. Is my understanding correct?

— Reply to this email directly, view it on GitHub https://github.com/EPFL-VILAB/omnidata/issues/32#issuecomment-1352423671, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHLE3JM6DETTZQ4ZM3WOCLWNJVIBANCNFSM6AAAAAAS7A3BUY . You are receiving this because you modified the open/close state.Message ID: @.***>

jeeyung commented 1 year ago

Thanks for the clarification!!

jeeyung commented 1 year ago

Hello Again!

You mentioned that you clip at whatever the maximum nonmissing depth is. I found that you clamp outputs into range [0,1] in the codes when testing. https://github.com/EPFL-VILAB/omnidata/blob/318a75569934737e67902f903531324d1f48ae8f/paper_code/test_depth.py#L207 Is it what I should follow to get the similar performance to yours? I got very low performance in several metrics for depth estimation :(

I thought that clipping is used only for visualization. If we clip the prediction and ground truth into range [0,1], isn't it that we ignore too many things in one image input, e.g. (1,128) ?

puyiwen commented 1 year ago

You mentioned that you clip at whatever the maximum nonmissing depth is. I found that you clamp outputs into range [0,1] in the codes when testing.

I find the same questions, how can you solve it? Thank you very much!!

Twilight89 commented 1 year ago

You mentioned that you clip at whatever the maximum nonmissing depth is. I found that you clamp outputs into range [0,1] in the codes when testing.

I find the same questions, how can you solve it? Thank you very much!!

Hi, did you solve this problem?