Open uulo opened 2 years ago
If there is only 1 fixed camera, then COLMAP will fail. If there are multiple cameras, the COLMAP should be able to reconstruct the camera poses.
How is the preprocessing altered when you have, for example, multiple cameras of the same scene? Do you input the video data from those cameras as one, sequential, continuous video?
The first step is to reconstruct the scene, you can use one image per camera to reconstruct the background scene. Once you obtained the camera poses and sparse scene point cloud, then you can segment the group plane, estimate SMPL, estimate the scale, etc. (Assuming all the cameras are synced)One thing to note is that you can use multiple views to optimize the SMPL estimates, that's out of the scope of this repo, you can have a look at: https://github.com/zju3dv/EasyMocap
Excuse my naivete but does this mean that an input video of a scene filmed by two synced cameras must be organized as: Frame 1: camera/perspective 1; Frame 1: camera/perspective 2; ... ; Frame N: camera/perspective 1; Frame N: camera/perspective 2?
Or should it be Frames 1-N, camera/perspective 1; then Frames 1-N, camera/perspective 2? Or should the synced footage of both cameras be juxtaposed for each frame?
I think you can check the ZJU-Mocap dataset, this dataset uses multiple cameras.
So are you saying that ml-neuman currently cannot handle input data for multiple cameras, and that one would need to integrate EasyMocap within ml-neuman to do that?