Open SZUshenyan opened 11 months ago
Hello! Are you interested in visualizing the depth map for human consumption or do you want to convert to greyscale int16 format to work with other systems?
First, load your depth map:
>>> sample['metric_depth']
<tf.Tensor: shape=(720, 1280, 1), dtype=float32, numpy=
array([[[9.671875 ],
[9.65625 ],
[9.6640625],
...,
[5.2539062],
[5.2578125],
[5.2695312]],
...,
[[2.2324219],
[2.2304688],
[2.2304688],
...,
[4.0546875],
[4.0546875],
[4.0585938]],
[[2.2304688],
[2.2285156],
[2.2304688],
...,
[4.0429688],
[4.046875 ],
[4.046875 ]],
[[2.2304688],
[2.2285156],
[2.2304688],
...,
[4.0351562],
[4.0390625],
[4.0429688]]], dtype=float32)>
During training, it's common for models to consume tensors like this directly. However, if you're working with a framework that needs 16-bit integer grayscale .PNG files, you can convert them like this:
UINT16_MAX = 65535
metric_depth_in_meters = sample['metric_depth'].numpy().squeeze()
metric_depth_in_mm = metric_depth_in_meters * 1000.0
metric_depth_in_mm_int16 = np.clip(metric_depth_in_mm, 0, UINT16_MAX).astype('uint16')
pil_image = Image.fromarray(metric_depth_in_mm_int16)
assert pil_image.mode == 'I;16' # 16-bit grayscale; see https://pillow.readthedocs.org/handbook/concepts.html#concept-modes
pil_image.save('/tmp/image_16bit_gray.png')
Encoding .png files this way is common, but ranges longer than 65.5 meters must be clipped, which becomes an issue with the synthetic data. The conversion from float to int also adds some quantization noise: at 10m, you lose ~1cm resolution, and at 40m away, you lose around ~3cm of resolution. SANPO-Real can't promise that high resolution or range anyway, but SANPO-Synthetic can. Just something to be mindful of. I recommend double-checking your framework's desired clipping range and units.
If you wanted to just visualize the depth maps for display, good old matplotlib can help you here:
plt.imshow(sample['metric_depth'], cmap='turbo', vmin=0.0, vmax=40.0)
You can also use the colormap object on the image to avoid losing any resolution, which also gets rid of the axes:
MAX_DEPTH = 40.0
colormap = plt.matplotlib.colormaps.get('turbo')
colored_image = colormap(sample['metric_depth'] / MAX_DEPTH)
colored_image = (255*colored_image).astype('uint8').squeeze()
pil_image = Image.fromarray(colored_image)
pil_image.save('/tmp/image.png')
Hope that helps!
When I download the depth map, what I get is the depth map stored in float16, how to convert it to png format, please let me know thanks!