facebookresearch / consistent_depth

We estimate dense, flicker-free, geometrically consistent depth from monocular video, for example hand-held cell phone video.
MIT License
1.61k stars 236 forks source link

Inconsistent initial depth values with the same image of different formats #30

Open duyanwei opened 3 years ago

duyanwei commented 3 years ago

Thank you for sharing this great work!

I've been playing with the demo and find this difference by making the following changes:

  1. comment this line https://github.com/facebookresearch/consistent_depth/blob/e2c9b724d3221aa7c0bf89aa9449ae33b418d943/depth_fine_tuning.py#L170
  2. replace it with color_fmt = pjoin(self.base_dir, "color_down_png", "frame_{:06d}.png") # load .png file
  3. It would trigger the else branch here to load image with opencv method, originally the .raw format is loaded. https://github.com/facebookresearch/consistent_depth/blob/e2c9b724d3221aa7c0bf89aa9449ae33b418d943/loaders/video_dataset.py#L30
  4. The depth is then collected as print(f'{torch.min(depth)} {torch.max(depth)} {torch.mean(depth)}') after this line: https://github.com/facebookresearch/consistent_depth/blob/e2c9b724d3221aa7c0bf89aa9449ae33b418d943/depth_fine_tuning.py#L190

CMD

python main.py --video_file data/videos/ayush.mp4 --path results/ayush --camera_params "1671.770118, 540, 960" --camera_model "SIMPLE_PINHOLE" --make_video --model_type monodepth2

Results (only the first 5 frames are attached because of space limit):

(The order is dmin, dmax, dmean in each row)

.raw format

6.328142166137695 47.315696716308594 16.583988189697266
5.077469825744629 48.07398986816406 16.365726470947266
5.110544204711914 48.893699645996094 16.422765731811523
5.226498126983643 45.44618606567383 16.323867797851562
5.302231311798096 40.730411529541016 16.2956600189209

.png format

6.994554042816162 84.51655578613281 26.014997482299805
6.27308988571167 100.8250503540039 27.64916229248047
6.835447311401367 94.29505157470703 26.967622756958008
7.421243190765381 95.34587097167969 27.17084312438965
7.507992744445801 91.67366027832031 26.8772029876709

Does it make sense?

kayleeliyx commented 3 years ago

I got the png file with (384, 224, 3) shape. But I read the raw file and got the shape like:

import numpy as np
input_file = 'path/to/rawimage.raw'
npimg = np.fromfile(input_file, dtype=np.uint16)

npimg has the shape of 172042. How to resize the raw shape? What's the difference between raw and png file?