NVlabs / intrinsic3d

Intrinsic3D - High-Quality 3D Reconstruction by Joint Appearance and Geometry Optimization with Spatially-Varying Lighting (ICCV 2017)
https://vision.in.tum.de/_media/spezial/bib/maier2017intrinsic3d.pdf
BSD 3-Clause "New" or "Revised" License
451 stars 80 forks source link

Help with poses from openMVG #3

Closed UannaFF closed 4 years ago

UannaFF commented 5 years ago

Hello! thank you for your good work, I tried running this for the first time with my own data collected with kinect v2. I calculated the poses using openMVG SFM. When I tried to run it for the first time i got:

After running keyframeselection:

24 filenames loaded. Loading frame 0... RGB-D sensor

frames: 24

depth min: 0.1 depth max: 2 Depth camera: Camera model: size: 512x424 intrinsics: fx=364.815, fy=364.815, cx=256.972, cy=205.54 distortion: 0 0 0 0 0 Color camera: Camera model: size: 1920x1080 intrinsics: fx=1081.37, fy=1081.37, cx=959.5, cy=539.5 distortion: 0 0 0 0 0 keyframes_file ./fusion/keyframes.txt keyframe_selection_window 20 Keyframe selection frame 0...

And after running AppFusion:

24 filenames loaded. Loading frame 0... RGB-D sensor

frames: 24

depth min: 0.1 depth max: 2 Depth camera: Camera model: size: 512x424 intrinsics: fx=364.815, fy=364.815, cx=256.972, cy=205.54 distortion: 0 0 0 0 0 Color camera: Camera model: size: 1920x1080 intrinsics: fx=1081.37, fy=1081.37, cx=959.5, cy=539.5 distortion: 0 0 0 0 0 SDF volume info: voxel size: 0.004 truncation: 0.02 integration depth min: 0.1 integration depth max: 2 Fusion... integrating frame 0... integrating frame 1... integrating frame 2... integrating frame 3... integrating frame 4... integrating frame 5... integrating frame 6... integrating frame 7... integrating frame 8... integrating frame 9... integrating frame 10... integrating frame 11... integrating frame 12... integrating frame 13... integrating frame 14... integrating frame 15... integrating frame 16... integrating frame 17... integrating frame 18... integrating frame 19... integrating frame 20... integrating frame 21... integrating frame 22... integrating frame 23... correct SDF ... clear invalid voxels ... Saving SDF (0 voxels) ... Saving mesh ... Mesh (original): 0 triangles, 0 vertices Mesh could not be generated!

I'm putting here a sample of the data I have: colorIntrinsics.txt depthIntrinsics.txt frame-000000 color frame-000000 depth frame-000000.pose.txt

I didn't change the min depth and max depth, maybe this is the problem. I should put the min and max depth from my depth files?

UannaFF commented 5 years ago

Also I noticed your poses are all generated to be relative to the first one, in openMVG they do it differently and their translations can have bigger values, maybe this is affecting the algorithm.

robmaier commented 5 years ago

Hi,

firstly, about the dataset: it looks like the dataset consists only of 24 frames, right? I would skip keyframe selection then, otherwise only 2 frames would be selected for the refinement. You can deactivate the keyframe selection file fusion/keyframes.txt by specifying the line keyframes: "" in both intrinsic3d.yml.

Secondly, you should probably pre-process your poses then such that the first pose is the identity. (You can do this by pre-multiplying the inverse of the initial first pose with all poses). Otherwise the clip bounds should be problematic I guess (which should be disabled in the beginning anyways).

Thirdly, about the depth fusion: you should first set both min_depth and max_depth to 0.0 in sensor.yml. Also, you should set all clip_y* values to 0.0 (and maybe adjust the voxel_size) in fusion.yml. You should at least get a fused SDF volume and a mesh to let you verify the input.

UannaFF commented 5 years ago

Hello, thank you for your answer. It did work like that but the results I got are weird. I tried with the following dataset. I have 264 frames, i kept my clipping options as 0 and activated keyframes. I set min depth to 0 and max depth to 7(I looked at my depth images). I did the preprocessing of the camera poses to multiply every matrix by the inverse of the first frame.

https://drive.google.com/open?id=1LsFZAAyQoxnWZ2qm-84-wO3g0kJAy7we

from images like this one:

frame-000000 color

I got:

resultmeshintrinsis3d00

It's better than the first mesh and I see some parts are geometrically trying to represent the images but there's still something happening.

I added the mesh in the drive folder

robmaier commented 5 years ago

It looks like the camera are not right yet, in the easiest case you have to invert them. Have you tried estimating your camera poses with VoxelHashing?

UannaFF commented 5 years ago

yes, it looks like the poses are weird and there's a mirror in the photos that can be affecting the reconstruction(?). I haven't tried VoxelHashing, I will try it now though I don't know if I will use it in the end because I need an only linux solution. I'll post the results.

UannaFF commented 5 years ago

I have no access to windows now and I'm trying to make the poses from openMVG work, actually I did the conversion you suggested and I passed the inverted matrix I got for every camera. When I do this I get a good rotation alignment but translation is off. snapfromthefront01 snap from the top00 frame-000008 color

robmaier commented 5 years ago

Wait - openMVG does not use the depth maps as initialization, right? As Structure-from-Motion usually estimates geometry and poses from color images only, the absolute metric scale is lost. Then also the translations don't have the right scale any more, while the rotations remain correct in principle. Have you actually tried to run ORB-SLAM2 using both color and depth? It is also cross platform and can use the depth as initialization.

UannaFF commented 5 years ago

Thank you for the hint on ORB-SLAM2, I'm working on it. Just one thing, are your poses specified in meters or cm?

UannaFF commented 5 years ago

Ok, now I have good poses my data. This is my result from appfusion

fusionresult png00

fusionresult png02

No tracks:

fusionnotracks00

Data is like:

frame-000000 color

frame-000000 depth

frame-000000.pose.txt

Now, I don't see color of the other objects, I'm only getting white and some weird values near the objects, when i compare it to the Lion dataset, the lion dataset has already good enough colored results after the fusion. Do you have any idea of what could be happening?

robmaier commented 5 years ago

The .png color images are supposed to be with 8-bit per channel RGB (i.e. 24-bit). You should be able to debug whether they are loaded correctly by adding a cv::imshow() in the code. Have you adjusted the camera intrinsics for depth and color camera?

UannaFF commented 5 years ago

Yes,for color i have:

1081.37 0 959.5 0 0 1081.37 539.5 0 0 0 1 0 0 0 0 1

and for depth I have:

364.815 0 256.972 0 0 364.815 205.54 0 0 0 1 0 0 0 0 1

Also, I doubled checked the rgb images and I was storing 32 bit. I fixed it but there's still some error.

I think the errors appear only in the problematic areas(with poor geometry reconstruction). Because the wall seems to be almost perfect. I will try different environments to check.

This is my latest result after fusion:

reconstruction00

DRAhmadFaraz commented 5 years ago

@UannaFF @robmaier @tmbdev @vinodgro @mjgarland

Can you please guide me how to get "frame-00000X.pose.txt" files for each RGB image of our own custom dataset.??

I will be thankful to you.

UannaFF commented 5 years ago

Hi @DRAhmadFaraz , In the beginning of the thread I was using openMVG to calculate poses https://github.com/openMVG/openMVG . They have good documentation on how to use their global or sequential pipeline to obtain the poses and sparse cloud only from RGB images. Unfortunately I couldn't make this work with those poses, I started using ORB-SLAM2 https://github.com/raulmur/ORB_SLAM2 , there you can obtain the poses with monocular RGB, stereo, or RGBD images. For my experience it was more precise because I was using the RGBD approach.

robmaier commented 5 years ago

@UannaFF Thanks a lot for answering the question. The only thing to add is maybe that the resulting poses from ORB-SLAM2 might have to be converted to a 4x4 matrix, which is stored explicitly in each frame-XXXXXX.pose.txt file.

DRAhmadFaraz commented 5 years ago

@UannaFF, Thanx a lot, I have installed it but I have now one basic question, Can I able to extract the poses exactly same as given in that file.? frame-00000X.pose.txt like in that sequence exactly same as it is.?

Or after I extract the poses then should I have to convert into the sequence as mentioned in `frame-00000X.pose.txt

DRAhmadFaraz commented 5 years ago

@robmaier Have you done that step.? converted to a 4x4 matrix, which is stored explicitly in each frame-XXXXXX.pose.txt file.

robmaier commented 5 years ago

@DRAhmadFaraz I don't have a prepared script for converting the camera poses from ORB-SLAM to our format. I have always used VoxelHashing to estimate the camera poses and then used a custom dataset converter to convert the .sens files from VoxelHashing to our format. If I remember correctly, ORB-SLAM2 outputs the poses in a single file, where each line contains a camera pose encoded as a quaternion. Basically the format of the TUM RGB-D benchmark (as I said, I hope that I remember correctly here). You could read this file line by line and then output a separate file for each line/pose, with the quaternions converted to 4x4 matrices (e.g. using Eigen or so). But I assume that @UannaFF should have already done what you are looking for?

DRAhmadFaraz commented 5 years ago

@robmaier Thanx a lot for your kind response.

I want to run this code on my own dataset of monocular RGB images and want to get 4x4 poses matrices

I just want to ask one question, Is this tool VoxelHashing Can able to extract poses from monocular RGB images.? and then can we able to convert them using custom dataset converter to convert the .sens files into 4x4 matrices .??

robmaier commented 5 years ago

@DRAhmadFaraz VoxelHashing is a 3D reconstruction system specifically designed for RGB-D video. It is consequently not possible to estimate camera poses for a dataset of monocular RGB images only.

This framework has similarly been developed for RGB-D based 3D reconstruction only, the initial 3D model is generated by fusion the depth maps in a Signed Distance Field. I have not applied it to monocular image data, but you could try to obtain an RGB-D sequence for a monocular sequence using a pipeline similar to the following:

DRAhmadFaraz commented 5 years ago

@UannaFF Dear, I have installed ORB-SLAM2 https://github.com/raulmur/ORB_SLAM and followed your instructions,

Now all I want is 1) The script which you use to get the poses from this SLAM algorithem and convert into 4x4 matrix in each file name frame-0000xx-pose.txt 2) The script which you use to get the processing of the camera poses to multiply every matrix by the inverse of the first frame.

my email: faraz6313@gmail.com

I will be thankful to you. best Regards

chethanab16 commented 3 years ago

@robmaier @UannaFF @tmbdev @vinodgro @mjgarland As I am new to this, could you tell me more about how to estimate the camera poses using VoxelHashing ,can I use Bundle Fusion instead of Voxel hashing and, which custom dataset converter you used to convert the .sens files from VoxelHashing to intrinsic3d format, It would be very helpful