Improve depth-map estimation with a-priori knowledge

cdcseacave commented 4 years ago

Traditional depth-map estimation techniques have weak support for textureless areas (ex. uniform color walls, or water surfaces) and the fusion process is affected by sky reconstruction attempts existing in the estimated depth-maps.

One possible solution, that we will explore here, is to extend OpenMVS depth-map estimation algorithm to exploit some a-priori knowledge as image segmentation, for example generated by machine learning.

The plan is to be able to detect planar walls, sky, and water; any other type of object detection and segmentation is welcome, but not mandatory. Assuming we have a mask for each image containing the labels detected by the image segmentation, use them during depth-map estimation in few ways:

[ ] identify an open source project that can do accurate image segmentation and generate image masks containing labels at least for sky, water and planar-walls
[ ] add support to ignore image areas containing sky/water during estimation (simply let the depth-map empty)
[ ] for water and textureless planar-walls, once the depth-map is estimated, fill the empty patch representing the planar surface with a plane estimated from the depth estimates of the pixels at the border of the patch (similar to the approach explained in the paper: "Planar Prior Assisted PatchMatch Multi-View Stereo")
[ ] once all depth-maps are estimated in this way, add few more PatchMatch iterations per depth-map with a geometric consistency constraint on top of the ZNCC score in order to fine-tune the individual estimates estimated above, considering the neighbor depth-maps as well
[ ] use the image segmentation masks during fusion as an extra check to validate inliers; ex: even if there are pixels appearing in only 2 depth-maps, consider them inliers if the depth estimates agree and they have the same label as well
[ ] use the image segmentation masks during mesh texturing to generate along the color texture an object segmentation texture, that will be much more accurate than the individual image segmentations, by integrating the label information in the texture inference algorithm

cdcseacave commented 4 years ago

TODO: try https://github.com/facebookresearch/detectron2 for image segmentation, as suggested by @pmoulon

elliestath commented 4 years ago

Hi @cdcseacave, thanks for this initiative! Actually yes, we have been working towards some of the tasks that you mention. More particularly, we added the semantic constraint assuming planar surfaces for a "flat" semantic class, adding this as a constraint in the depth map filtering step (FilterDepthMap). However, instead of using planes as geometry, we rather fill in the depth maps using NN gap interpolation to all blank pixel that belong to this "flat" class. This had indeed an effect on the depth maps, but the point cloud does not seem to be much optimized (gaps still exist). Do you think this may be of help?

roby23 commented 4 years ago

Hello @cdcseacave,

regarding the step "add support to ignore image areas", we were able to ignore areas of the images but we did it at the end of depth maps estimations. We tried passing the mask to the DepthEstimator::MapMatrix2ZigzagIdx function in order to leave the depth map empty, but we got wrong results because it is called only on the first image and the mapped coords are then used on the other images.

Also executing DepthEstimator::MapMatrix2ZigzagIdx on every image with its own mask led to wrong results, because depth is then estimaded using the neighbours images and so in the confidence maps the parts to ignore are filled with wrong confidence values.

We tackled this problem by simply setting the confidence value of the masked pixels to 0 after the depth map computation.

Other than that we also have tried to filter out the masked values in FilterDepthMap and FuseDepthMaps with good results.

Do you think there is a better way to do this?

cdcseacave commented 4 years ago

@elliestath Yes, gap interpolation should make the depth-map "look" complete, but the filled in values are not accurate, and get removed during fusion. The proposed paper suggests a better way to deal with this. Additionally, adding an extra set of iterations doing also geometric consistency should help further improve accuracy.

cdcseacave commented 4 years ago

@roby23 I have just added mask support in latest develop. It works fine for me, pls confirm on your side.

roby23 commented 4 years ago

Hello @cdcseacave

so now that we can mask out unwanted pixels from the reconstruction, I think that the nIgnoreMaskLabel should be a list of values, in order to mask out different classes. Does it make sense to you? Also, do you think it is a good approach to load the masks along with the images (in LoadInterface()) and store them in the DepthData struct (add something like labelMap)? In this way we can have the masks (labels) during all the steps of the reconstruction.

cdcseacave commented 4 years ago

yes, nIgnoreMaskLabel can be easily extended to a list (values separated by comma in command line and use Util::strSplit() to split it in an array)

LoadInterface() is for sure not a good place to load the masks (the images are not loaded there either), and for now we do not need the mask anywhere-else, at least for densification; if the mask is needed in other modules, it is their responsibility to manage them, the same way the manage images loading; what we can do is to move ImportIgnoreMask() in Image so that we can reuse it in other modules as well, but we will do that when the time comes, as in other modules we might need to process the mask differently, so same function will not work

roby23 commented 4 years ago

Ok, I can take care of the nIgnoreMaskLabel array.

About the masks loading maybe I didn't explain it well. We want to use segmented images as masks (to exclude sky/etc) with nIgnoreMaskLabel and then we want to use them to check for label consistency during PatchMatch or depth map fusion. So you are right, ImportIgnoreMask() is not suitable for loading the labels, we just need to load the image as it is when we need it. My point was just to preload them and keep them stored in some data structure (could be Image or DepthData) and apply them as masks or get label values when needed.

cdcseacave commented 4 years ago

I understand, but as I said that is not how things are done in OpenMVS (in order to lower the memory usage). The reference and target images plus the reference mask are loaded during depth-map estimation and then immediately released. Later, after all depth-maps are estimated, during fusion the images and the image-segmentations are loaded again (the fusion can be implemented in a better way not to use load images at once, but make use of a FIFO cache).

So as it is now is fine, just during fusion load the image segmentation.

elliestath commented 4 years ago

@cdcseacave thanks for the suggestion! This paper follows a probabilistic model to assign the best planar prior to points, that is far away from what has been done till now in OpenMVS. Do you suggest that regions that are considered "planar" should follow this approach and not the one already implemented or should it work in a complementary way?

cdcseacave commented 4 years ago

The paper is building upon the framework proposed in COLMAP which follows a probabilistic model. However, similarly can be applied in OpenMVS; in ScorePixelImage():

score = photometric_score + alpha * plane_prior + beta * geometric_prior

In practice this is a good compromise between correctness and speed.

cpheinrich commented 4 years ago

These are interesting ideas! I bet sky removal will work ;)

CanCanZeng commented 4 years ago

Hi, I find a very good project that can segment sky area very accurately and do not have much false positive case. Its https://github.com/jiupinjia/SkyAR, although its target is not to do sky segmentation, but the intermediate product really give a good result. Hope it will be helpful @cdcseacave

cdcseacave commented 4 years ago

thx @CanCanZeng I read the paper a while back, it is interesting, though not sure how accurate is the segmentation per image, as they seem to have also a post processing step that targets exactly the border between sky and the rest of the scene; I didn;t have a chance to break the code and output only the raw segmentation mask

in the end though, identifying the sky is helpful, but more importantly we need here a segmentation algorithm that identifies planar areas to improve the accuracy of the foreground; sky removal only affects the noise at the border of the foreground structure

CanCanZeng commented 4 years ago

Yes, you are right @cdcseacave , the sky segment is only helpful in noisy removal, that's how I'm using it now.

If you want to just output the raw segment, you can use my version code, just set "save_jpgs" flag to "true" in config file https://github.com/CanCanZeng/SkyAR/blob/89a571eac034527ba7b32b168ebd31f8e901c7a6/config/config-just-skymask.json#L21 I just write a function to output the raw sky mask, so the mask is not resized.

essebbaninaim commented 3 years ago

Hi everyone,

Im currently working on a way to Improve depth-map estimation.

Did you guys find out a solution?

Thanks.

CanCanZeng commented 3 years ago

Hi @cdcseacave , there is a new project about estimate depth map with planar prior, might it will be helpful! https://github.com/GhiXu/ACMP

yuancaimaiyi commented 3 years ago

reference paper:Semantically Derived Geometric Constraints for MVS Reconstruction of Textureless Areas depth depth1

Livan89 commented 3 years ago

Hello. Thankful for everything they do, it's great.

@cdcseacave is there any update regarding generating textureless areas like walls?

Thank you very much for the quick responses and all you do for the Gen 3D community.

cdcseacave commented 3 years ago

We are working on it, but nothing to release yet.

Livan89 commented 3 years ago

Thanks a lot @cdcseacave . I will be pending, I make a lot of use of the MVS densification process for studies, each update of you is great. I am at your service for any help you need.

colorfulCloud commented 3 years ago

@cdcseacave manual masks are also very useful for applications like autonomous driving, to filter out moving objects, looking forward to your update.

helloycr commented 2 years ago

We are working on it, but nothing to release yet.

looking forward to your update.

lbrianza commented 1 year ago

Hello @cdcseacave ! Thank you for all your great work. I was wondering if there is any update regarding this specific topic? I currently use openMVS a lot, and as I am looking for ways to improve the reconstruction process, I may be able to give a hand.

cdcseacave commented 1 year ago

I did not work myself on this, but from what I understood from those that contributed it is hard to come with a-priori depth data good enough to help improve the openMVS reconstruction more than the already self supervision (the geometric weighting already implemented).

One direction that should be doable though is detect at least sky and water, to mask them out.

What exactly is your interest? How do you want to use OpenMVS? Maybe there is a way to improve results for your needs.

lbrianza commented 1 year ago

I did not work myself on this, but from what I understood from those that contributed it is hard to come with a-priori depth data good enough to help improve the openMVS reconstruction more than the already self supervision (the geometric weighting already implemented).

One direction that should be doable though is detect at least sky and water, to mask them out.

What exactly is your interest? How do you want to use OpenMVS? Maybe there is a way to improve results for your needs.

Hi @cdcseacave , basically I am using OpenMVS for 3D reconstruction of indoor scenes. The method works quite nice as long as there are many textures in the scene, however the final scene always look incomplete whenever there are featureeless surfaces (like flat walls, which occur frequently), as we know the limitations of the method in this cases. So, I'm especially interested in the reconstruction of walls. I was wondering if there was any room for improvement here. I worked mostly on the pre-processing of the images (noise reduction, histogram equalization), which generally improved the reconstruction, but not much improvement on the reconstruction of textureless surfaces so far.

cdcseacave commented 1 year ago

image preprocessing is indeed helpful and if you have a general algorithm that improves most scenes, pls propose a PR and I'll add it to OpenMVS

as for textureless scenes, that is a topic where ML has the biggest chance to improve the results; do you have experience in machine learning? can you provide a module that masks textureless regions in the input images and semantically segments them?

lbrianza commented 1 year ago

For the image processing, I wrote a script then applies clahe and denoising (using opencv's fastNlMeansDenoising) on each image, but I need to adapt it a bit first as it takes a video as input (forgot to mention - I start from a video rather than photos, so in general tha frames have lower resolution/more noise - thus need preprocessing). I can propose a PR in case, but please note that I tested this procedure only on indoor scenes, not sure how it works on other situations (I suspect that if the original photos are already good enough it may not give any improvement)

For the textureless scenes, yes indeed I was searching for some neural networks that can detect featureless regions, however I have yet to find any useful one for this specific case - I see a few models that target the detection of textureless objects, but I'm not very sure they can be applied here. I can give it a try though, if there are better clues on this topic at the moment.

Some time ago I bumped into this article where the authors show improvements in the 3D reconstruction of textureless areas by using an algorithm called "Wallis filter". I tried to replicate it by implement the algorithm by myself, but in my case it didn't lead to any improvement - my guess is because they were using photos of much higher quality (working directly on the RAW formats) on outdoor scenes, which are normally easier to process given better light/less noise.

cdcseacave commented 1 year ago

I also tried the "Wallis filter" a while back, and I remember I did not see any improvement either, in any stage of the the entire photogrammetry pipeline

amughrabi commented 11 months ago

TODO: try https://github.com/facebookresearch/detectron2 for image segmentation, as suggested by @pmoulon

This is a fascinating topic! I suggest other self-supervised vision transformer approaches, such as DinoV2. I don't know if the idea is valid in this ticket context, but wouldn't it be straighter to have the RGB-D as an input, where the depth is estimated in the image via DinoV2? The idea is to have the estimated depth per image and then take this information to correct or validate the current depth estimation. But that’s just my two cents!

cdcseacave commented 11 months ago

I think you talk about https://dinov2.metademolab.com in which case the depth map estimated is from a single image, not only that estimating depth from a single image is crazy (ill posed problem) and is very poor quality, even if it would have been accurate it lacks scale, so it will not match the other depth maps from the other images

amughrabi commented 11 months ago

I think you talk about https://dinov2.metademolab.com in which case the depth map estimated is from a single image, not only that estimating depth from a single image is crazy (ill posed problem) and is very poor quality, even if it would have been accurate it lacks scale, so it will not match the other depth maps from the other images

Regarding the scaling, I'm not sure, I recommend you check, for the nyu-depth model, what the dataset provides as ground truth and walk back from there: https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html

helloycr commented 9 months ago

I have done a lot of work on OpenMVS based on semantic priors. If you need it, please contact me (qq: 956048984)

cdcseacave commented 9 months ago

I am interested in seeing some of the results you got using semantic priors, can you share some results?

and can you make a PR with the code using semantic priors?

cdcseacave / openMVS

Improve depth-map estimation with a-priori knowledge #585