[question] Need help on drone + ground workflow

alicevision / Meshroom

3D Reconstruction Software

http://alicevision.org

Other

10.99k stars 1.07k forks source link

[question] Need help on drone + ground workflow #1437

Closed HarikalarKutusu closed 2 years ago

HarikalarKutusu commented 3 years ago

Describe the problem I'm not new to photogrammetry but never an expert either. I used it for our museum objects. This time I tried a larger project.

I'm reconstructing a small town square. I have drone images, drone videos, and pictures taken from the ground. I could reconstruct from drone images (not videos), but of course, the building front views are not acceptable. I could (somewhat) reconstruct from ground images, but the ground is not reconstructed and top of the buildings have meshing problems - as expected. I tried to augment "ground project" with near drone shots and vice-versa, but the cameras in the second set never gets reconstructed (not a single camera).

Drone: DJI Phantom 4 Pro (20 MP images) Ground: Panasonic GH5 on DJI gimbal (20 MP images).

I use: Sift + Akaze, Descriptors high & high, downscale=1 (the only setting I could get results)

I cannot take more pictures as the town square is now being renovated. These pictures are from before the renovation.

I'm working on this for two months now, any help will be much appreciated.

Dataset

Sample drone shot:

Sample ground shot:

Desktop (please complete the following and other pertinent information):

OS: Win 10 x64
Python version: 3.9.x
Qt/PySide version: ???
Meshroom version: please specify if you are using a release version or your own build
- Binary version (if applicable): 2021.1.0

natowi commented 3 years ago

There are a few things you can try, but this really depends on how structured your datasets are and how thoroughly you captured the area.

A) reconstruct the drone and ground footage separately and merge it later in Meshlab for example B) split your dataset in regional patches and augment the graph systematically. The overlap between both datasets in a patch should be significant. c) a mixed approach

natowi commented 3 years ago

Here is a quick sample I did in OpenCV to demonstrate the issue you face: match The features detected in the left image need to be matched with the right image, but the target area (we know) in red makes up only a fraction of the second image, so there are plenty of other features. The overlap between images should ideally be 60%+ - which is not the case in this example. Start with an area where you have an equal amount of images from your two cameras and a sufficient overlap. Augment your dataset from there.

The hard shadows complicate things a little, so best start with an area that is well lit and does not have reflections. Structure you dataset by location if you did not capture the images systematically.

HarikalarKutusu commented 3 years ago

Thank you for answering. I am using the technique you described in B. After 20+ failures (and learning) I could group the images. Start with a major (~100 pics) set taken from the middle of the square, augment them with regional/closer shot sets (6-7). But it only worked for "ground level images".

The hard shadows in drone shots are indeed problematic, but they were taken by the municipality and shared with us. They, unfortunately, perhaps intentionally, chose a sunny day. As the square is a rather crowded area it was not possible for us to get a permit to use our own drone for near-shots (i.e. a bit higher than the roofs). That would solve the problem, I think...

The ground-level pictures I took are somewhat systematical, but with many moving objects (people and small electrical busses). People are easily removed but some vehicles parked during the shots and then moved away which caused some trouble. They need to be cleaned.

The problem is the overlaps you mentioned. Drone shots are grid + one tour point-of-interest. They are at a higher altitude to see the whole square, so the buildings are small in the image. Is it possible to crop identical-sized areas from them so that the "object" fills the image? Will it be possible for Meshroom to infer the camera characteristics (I assume Exif data should be removed)?

If not, I'm afraid I should use the method in A, i.e. using Meshlab to merge them. That would be tough with those millions of points/faces...

Thank you for the insight again.

natowi commented 3 years ago

People are easily removed but some vehicles parked during the shots and then moved away which caused some trouble. They need to be cleaned.

If the moving people or vehicles only make up a really small portion of the image, they do not necessarily need to be removed. How do you remove them at the moment? Your edits could introduce artefacts that can be picked up by the algorithms and cause false feature detections. Image masking would be helpful, but it is sadly not implemented in the current release.

Is it possible to crop identical-sized areas from them so that the "object" fills the image?

Will it be possible for Meshroom to infer the camera characteristics

yes

using Meshlab to merge them. That would be tough with those millions of points/faces...

Meshlab can handle it, no problem.

But there is another trick you could try:

Pick an area where your drone shots and ground images overlap (maybe the starting point of the drone?) Then branch off to augmentations from this initial reconstruction. You can later merge the two branches together with the sfm alignment node.

overlap -> SFM -> augmentation 1 -> augment with ground images... | ..................................|................................................................................................................> sfm alignment -> depthmap -> texturing ....................................|-> augmentation 2 -> augment with aerial images... /

You would need to reconnect the node connection to the second augmentation so it augments the overlap reconstruction and not the ground image reconstruction.

I can´t promise this will work, but experimenting with node graphs can solve some issues. If this does not work or you don´t have the time, simply use Meshlab to align both meshes. It works quite well.

HarikalarKutusu commented 3 years ago

Thank you...

How do you remove them at the moment? Your edits could introduce artefacts that can be picked up by the algorithms and cause false feature detections. Image masking would be helpful, but it is sadly not implemented in the current release.

I didn't yet. I was planning to use external software like MeshLab, VisualSFM and/or Blender to get rid of unwanted artifacts and mesh areas, but I don't have any idea how I can deal with textures...

My main idea was to get a high-resolution base model and simplify it if needed, e.g. for WebGL use case. Or use it in a blender animation etc.

Is it possible to crop identical-sized areas from them so that the "object" fills the image? no

To prevent a possible misunderstanding: I was not implying that Meshlab should do it. I will crop them manually with SW like Photoshop, say 1000x800 crops out of 5000x4000 originals, and feed them to Meshlab as augmentation. As Meshroom can infer Camera, this might work - I think. I might go ahead and try of course :)

As a test, I will try the method you suggest, i.e. location-based augmentation, ground-drone mixed...

I just got my first result from the whole image-set taken from the ground. Here:

You can see the problem with the roofs and ground.
The imperfection on the lower-right is caused by the before-mentioned small busses waiting. I hope I can remove those in post-processing.
The problem on the left building faces is caused by missing shots as they were obstructed. I did take shots from that alley but either they are not reconstructed or failed to become prominent landmarks. That alley is seen in drone shots, but they are not good for building faces of course.
On the upper right I can see "greens" on the building face and in the air, so that would mean I have some wrongly reconstructed shots. It is very hard to identify them within 218 images.

For reference, the final mesh has 6.1 M triangles...

natowi commented 3 years ago

Do you use the default graph for the reconstruction? Did you try if the dsp-sift describer? It performs better that SIFT and AKAZE in many cases and will probably become the new default for the next release. It could help you to match some more of your images and maybe even to match the images from both datasets.

You can of course give cropping a try, but there are numerous things that can go wrong, resulting the images not being detected. I would use the same bounding box dimensions on all images to export cropped images. This will be similar to a virtual zoom. Meshroom will try to approximate the camera intrinsics. Having the same dimensions and (no or fake) metadata will group them together so they share the same settings in cameraInit. Cropping images is generally not recommended, especially if the metadata are being kept.

Trying to get your not so ideal datasets to reconstruct as a whole will likely require more time and trials. I would simply use the yellow markings on the plaza and rooftop edges as reference points to align your meshes in Meshlab. (You could even use the yellow lines to scale your model to real world sizes (in Meshlab), if you measure(d) the distance on site) (In Meshlab temporarily disable the texturing to improve the performance, otherwise it may lag a little)

Next time you have a project like this, try to make sure the images are being taken from a similar distance and all surfaces are covered with a 60%+ overlap.

HarikalarKutusu commented 3 years ago

Do you use the default graph for the reconstruction? Did you try if the dsp-sift describer? It performs better that SIFT and AKAZE in many cases and will probably become the new default for the next release. It could help you to match some more of your images and maybe even to match the images from both datasets.

I've been using 2020's latest first. My initial tests were unsuccessful, and I read in multiple places the results are very dependent on versions, after that I upgraded to 2021.1.0, After that I did many trials on smaller datasets on every possible setting I could understand from the documents, which include dsp-sift only, sift only, dsp-sift + akaze. I will retry the set with your position-focused method, maybe that would solve it.

Do you mean ds-sift only?

I would use the same bounding box dimensions on all images to export cropped images. This will be similar to a virtual zoom.

Exactly what was in my mind.

Trying to get your not so ideal datasets to reconstruct as a whole will likely require more time and trials. I would simply use the yellow markings on the plaza and rooftop edges as reference points to align your meshes in Meshlab. (You could even use the yellow lines to scale your model to real world sizes (in Meshlab), if you measure(d) the distance on site)

Do you mean editing the images to put yellow marks? I already have 2D Autocad drawings of the area, dimensions are not a problem.

Next time you have a project like this, try to make sure the images are being taken from a similar distance and all surfaces are covered with a 60%+ overlap.

Indeed! That Square is on the Princes' Islands (Istanbul) which can only be accessed by public transportation, which was restricted due to pandemic lockups, and I did not get my vaccine shots at that time. I could only go once to the Island due to the opening of an exhibition I designed. In fact, the shots I took from the ground have 60+ overlaps but there were other factors I mentioned. I should have taken 360 deg panoramic style shots.

natowi commented 3 years ago

Do you mean dsp-sift only?

When selecting dsp sift you normally don´t need akaze, which was recommended in previous versions to improve the number of features being detected. shift will always run in the background for now, too, so disabling it has no effect. Sometimes selecting all three describers can make sense. Here is an example: Capture3

Do you mean editing the images to put yellow marks? I already have 2D Autocad drawings of the area, dimensions are not a problem.

When aligning or scaling meshes you need to place reference points in the scene. Having easy to find and target markers like the yellow markings on the plaza are a plus.

in fact, the shots I took from the ground have 60+ overlaps but there were other factors I mentioned. I should have taken 360 deg panoramic style shots.

You can use a 360 pano camera, but there are also a few things to consider like resolution and lens distortion. It has the benefit that it can be loaded in meshroom with a rig constraint which will save some time calculating the camera positions, as the images share the same location.

Do not take "360 deg panoramic style shots"! Photogrammetry requires the motion of the camera (physical distance between two shots), you don´t gain much from capturing images from the same stationary spot (there is always an exception to a rule). In general capturing images from a stationary point leads to the assumption that you captured enough images and covered all surfaces with enough overlap, where in fact your viewpoint did not change much and your dataset is pretty much garbage (for photogrammetry ;) ).

There are numerous guides on capturing images for photogrammetry (I collected some here https://github.com/alicevision/meshroom/wiki/Tutorials), I can warmly recommend you to read up on the best practices to avoid issues like this. Preparation is everything. This is a good entry summary on the topic https://d32ogoqmya1dw8.cloudfront.net/files/getsi/teaching_materials/high-rez-topo/sfm_field_methods_manual.v2.pdf

For capturing in places where you can´t get permits for drones, I can highly recommend you to use telescope poles (photomasts) (diy or commercial).

HarikalarKutusu commented 3 years ago

Awesome telescope poles :) I also have a small pole setup that can reach up to 4 meters but I wouldn't use it with my DSLR's :) But it is very good for my lightweight 360 cameras.

Thank you for your time in covering shooting techniques. Although I read/watched many tutorials, I must have missed that point. I've been using panoramic techniques & virtual tours starting from the late '90s, so this must be a bad habit.

Now it is time to try what you proposed.

HarikalarKutusu commented 3 years ago

I should have read also this one: https://iopscience.iop.org/article/10.1088/1742-6596/1418/1/012006/pdf

HarikalarKutusu commented 3 years ago

Some follow-up:

I tried suggested location-based ground+drone image sets, but unfortunately, it did not work either. A pole you suggested and/or low altitude drone imaging seems to be a must in these cases.

I tried the cropping method I mentioned, a digital zoom to the small building in the middle of the square, which is well defined in the scene. I used 1920x1080 crops from 20 Mpx images or 4K video extracts and removed EXIF. The building was 1/9th of the crop, I could not zoom any further. When merged with another drone set they got constructed just fine. But they did not work with ground images either. I used multiple combinations of descriptors to no-avail.

I reconstructed the whole set with dsp-sift as you suggested. It actually performed better, both in the number of features extracted and in calculation time, also resulting in a better mesh - although different than sift+akaze...

The main positive difference came from image sets extracted from drone videos. Drone images were taken at 92-99 m altitude, but videos were taken at around 50 m. Although lower pixels, buildings are larger in them. There is some junk here and there but the result is better for some use cases.

The same area from the ground images is much better of course (not the ground or roofs)... Even phone boots are reconstructed.

Now, I will check if adding akaze makes any difference, it will take a couple of days thou... Next, I will try merging them.

natowi commented 3 years ago

Drone images were taken at 92-99 m altitude, but videos were taken at around 50 m.

99m is quite a high altitude for drone image surveying when you want to merge the dataset with a ground survey. I usually fly at an altitude between 20m and 50m.

Did you combine the two drone image sets? The result looks quite good. I would keep the results and manually merge it with the ground survey outside of Meshroom. I think one more complication with your ground images is, that the trees have quite prominent green leaves, where as in the drone footage the trees don´t have leaves.

Could you share a portion of the dataset (small area with both images from drone and ground and crop) with me for testing? I want to check if I can get the images to match by modifying some parameters.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 2 years ago

This issue is closed due to inactivity. Feel free to re-open if new information is available.