facebookresearch / egolifter

This is the official repository for "EgoLifter Open-world 3D Segmentation for Egocentric Perception, ECCV 2024"
https://egolifter.github.io/
Apache License 2.0
94 stars 7 forks source link

General questions about processing egovideos #8

Closed SunghwanHong closed 2 weeks ago

SunghwanHong commented 3 weeks ago

Hi @georgegu1997!

I am currently working on my new project based on your code base. Big thanks to your code, I can save a lot of time implementing the preprocessing stage of ego-videos.

Nevertheless, I am still trying to familiarize with the all the processing codes, and have a few questions.

  1. I am very new to egocentric videos, so It would be very helpful if there are things that are different from conventional exo-images. Would it be possible if you provide short bullet lists (guideline) of the things ego-video needs to be taken care of compared to conventional processing? For example, in my case, I am trying to use sparse set of consecutive images, e.g., 5 views, to reconstruct a scene. The inputs I will be using are camera pose, intrinsics, RGB image and depth map. 0-1. If I want to resize the images, can I just resize the undistorted images and modify the intrinsics accordingly? 0-2. For processing depth map (in ADT dataset), do I need to process it with exact same procedure to rgb processing? (undistort). 0-3 While ADT has per-frame depth map, Ego-Exo4D only has global semi-dense point cloud right? How should I obtain per-frame (or per timestep) depth map?

  2. In https://github.com/facebookresearch/egolifter/blob/d177089a153831e4cd592fb09c8f86f8a63d398c/scripts/process_adt_3dgs.py#L289, I read the 3 commented lines, but I could not understand fully. If I resize the images and depth of ego-images, do I need to take care of the pose as well? For example I need to retrieve different pose, since it says (2.5ms for 1408x1408, 8ms for 2880x2880) and you added: timestamp_ns + 2_500_000 # 2.5 ms.

Thank you very much for your time and support!

Sunghwan

georgegu1997 commented 3 weeks ago

Hi, thanks for your interest!

0-1. If I want to resize the images, can I just resize the undistorted images and modify the intrinsics accordingly?

Yes, the undistorted images are already linear (pinhole camera model) images and can be resized as normal images (together with proper operations on the intrinsics).

0-2. For processing depth map (in ADT dataset), do I need to process it with exact same procedure to rgb processing? (undistort).

I think you probably need some similar procedure, but seems the distort_by_calibration is only implemented for RGB images. Also, I think the interpolation could be tricky in this calibration process. For this question, I suggest you open an issue in projectariatools repo.

0-3 While ADT has per-frame depth map, Ego-Exo4D only has global semi-dense point cloud right? How should I obtain per-frame (or per timestep) depth map?

Yes, Aria Glasses (in the wild) does not give a dense depth map. One thing you could do is run monocular depth models on the Aria images. The results won't be metric depth though.

2 If I resize the images and depth of ego-images, do I need to take care of the pose as well? For example I need to retrieve different pose, since it says (2.5ms for 1408x1408, 8ms for 2880x2880) and you added: timestamp_ns + 2_500_000 # 2.5 ms.

If I remember correctly, 1408 and 2880 are two "native" resolutions for Aria glasses, which are the resolutions of the image stream recorded by the glasses. You choose one setting before recording and it will be fixed for the recording. 2.5ms was hard coded in that script because I was kind of assuming 1408 resolution. Feel free to change it if needed.

SunghwanHong commented 3 weeks ago

Thanks for the quick reply!

So for 0-3, would this be possible? Since there is also semi dense observation file, which seems to somewhat link to semi dense world point clouds, can I somehow link the each pixel in the frame to one of the point clouds observed at time step t?

For the last question, if I resize the image to 256 256, should I add something else to timestamp_ns other than 2500000? I'm not sure whether I need to do something to the camera extrinsic if I resize the image.

Finally, since i will be using sparse set of images, would it be better to use open loop trajectory rather than the closed loop trajectory like your codes?

Thanks!!!

georgegu1997 commented 3 weeks ago

So for 0-3, would this be possible? Since there is also semi dense observation file, which seems to somewhat link to semi dense world point clouds, can I somehow link the each pixel in the frame to one of the point clouds observed at time step t?

I think it's possible to extract the semi-dense observation according to the documentation here. But this at most gives you sparse pixel depth measure and there could be a lot of noises in the results.

if I resize the image to 256 256, should I add something else to timestamp_ns other than 2500000?

No. The original images recorded by the camera is 1408x1408, and this determines the rolling shutter read-out delay. Resizing happens after recording and does not affect this delay.

Finally, since i will be using sparse set of images, would it be better to use open loop trajectory rather than the closed loop trajectory like your codes?

This depends on whether your application requires online inference (not seeing the future images). The difference between open and closed-loop is how the bundle adjustment is performed. Please see here for explanation.