KexianHust / ViTA

ViTA: Video Transformer Adaptor for Robust Video Depth Estimation
Other
42 stars 2 forks source link

Modifying demo to use image sequences instead of video #3

Open vitacon opened 1 year ago

vitacon commented 1 year ago

Hello, I use MiDaS in my current project and it takes a sequence of PNGs as input and the result is also saved as grayscale PNGs. I expected ViTA to save both input and output frames to a temporary folder from where I could copy them but it does not seem to happen and all frames are kept in RAM...?

I suppose changing the code to use PNGs instead of MP4 should be rather simple but unfortunately I am no Python expert. =] Can you give me some hints? I suppose the main loop that goes through all frames (img_inputs) starts at line 133, right?

for i in range(0, img_num, seq_len - overlap * 2):

KexianHust commented 12 months ago

I have added a function to enable our model take a image sequence as input. You can have a try!

vitacon commented 12 months ago

Thanks for the modification! =)

However, it is not quite there for me yet:

I use MiDaS in my current project and it takes a sequence of PNGs as input and the result is also saved as grayscale PNGs.

  1. It seems ViTA does not support MiDaS switch "--grayscale" and it always uses "Inferno" instead
  2. Even with "--format imgs", the result is always saved as MP4 and not PNGs (it would be nice to keep the original names of the images too)
  3. Right now the output video made from PNGs does not have a proper name and the output file is called just ".mp4"
KexianHust commented 11 months ago

For the first question, you can add the following behind the Line 308:

cv2.imwrite(os.path.join(output_path, 'frame_%04d.png' % (i + 1)), cv2.resize(predictions[i], dsize=(img.shape[1], img.shape[0]), interpolation=cv2.INTER_LINEAR))

For the second and the third question, you just need to put your folder under the 'input_imgs', e.g., input_imgs/test/, then the result will be saved as 'test.mp4'.

vitacon commented 11 months ago

Thanks! 👍

Actually, I don't need the output video, so I commented all videoWriter stuff out and I kept just this:

    for i in range(predictions.shape[0]):
        print("  exporting ", img_names[i])
        cv2.imwrite(os.path.join(output_path, img_names[i]), cv2.resize(predictions[i], dsize=(img.shape[1], img.shape[0]), interpolation=cv2.INTER_LINEAR))

However, I think an additional argument might be useful for other people...

vitacon commented 11 months ago

And I'm glad it was worth it - MiDaS versus ViTA. =)

https://github.com/KexianHust/ViTA/assets/58292841/a4d3482f-8a7b-498f-a6bf-0995a779e634

KexianHust commented 11 months ago

And I'm glad it was worth it - MiDaS versus ViTA. =)

328-depth-side-by-side.mp4

Glad to hear that!

vitacon commented 11 months ago

One more detail that might be useful to someone else - I use sometimes Czech letters in names of my folders and ViTA could not handle that. Apparently it is a known problem related to Unicode in cv2.imread and imwrite so I had to replace those lines:

READ

        # image = cv2.imread(img_namef)
        image = cv2.imdecode(np.fromfile(img_namef, dtype=np.uint8), cv2.IMREAD_UNCHANGED)

WRITE

        # cv2.imwrite(os.path.join(output_path, img_names[i]), cv2.resize(predictions[i], dsize=(img.shape[1], img.shape[0]), interpolation=cv2.INTER_LINEAR))
        is_success, im_buf_arr = cv2.imencode(".png", cv2.resize(predictions[i], dsize=(img.shape[1], img.shape[0]), interpolation=cv2.INTER_LINEAR))
        im_buf_arr.tofile(os.path.join(output_path, img_names[i]))