NeoGeographyToolkit / StereoPipeline

The NASA Ames Stereo Pipeline is a suite of automated geodesy & stereogrammetry tools designed for processing planetary imagery captured from orbiting and landed robotic explorers on other planets.
Apache License 2.0
478 stars 168 forks source link

Stereo step 4 :Filtering failed #356

Closed zhaomumu233 closed 2 years ago

zhaomumu233 commented 2 years ago

When I used the SGM algorithm to process Stereo satellite images to obtain DEM, the program broke down on "Stereo Step 4". Error message indicating that the RD. tif file format for a subblock is incorrect. I would like to know what causes the occurrence of this situation, and is there any corresponding treatment? Thank you very much for answering my confusion!

The following error information is displayed:

Error:GdalIO: '/ssdnvme/satelliteImage/Result/1-512000_30400_1600_1600/1-512000_30400_1600_1600-RD.tif 'not recognised as a supported file format.(code = 4) GDAL:Failed to open Result/1-RD.tif. Traceback (most recent call last) File"/home/amax/anaconda3/envs/asp/bin/parallel_stereo",line 879,in normal_run('stereo_fltr',args ,msg='%d: Filtering' % step) File"/home/amax/anaconda3/envs/asp/bin/parallel_stereo",line 546,in normal_run raise Exception('Stereo step' + kw['msg'] + 'failed') Exception: Stereo step 4: Filtering failed

oleg-alexandrov commented 2 years ago

It looks that something made the SGM algorithm crash, and it wrote a junk file.

SGM can be quite memory-intensive, and this can happen if it expects more memory than what your machine has.

This can happen, if, for example, your input images have clouds, or very black areas, etc, which confuse it.

You can try re-running SGM with, for example, --corr-memory-limit-mb 3000, which would use only 3GB per process (approximately). Or even less. (That may result in good data being ignored too if pushed too low.)

You can also try using parallel_stereo with fewer processes, so a smaller value for the --processes variable.

You can also run 'top' on the system and see where it is running into issues.

The latest daily build ASP, from https://github.com/NeoGeographyToolkit/StereoPipeline/releases, also has a way of filtering more aggressively the low-resolution disparity, which may result in less memory usage later. For that you'd set the option --outlier-removal-params to something lower than the defaults, maybe 75.0 2.0.

You can also examine your input images, and try to see if they are very different from each other or if they have artifacts.

Also, it is possible to mapproject the images first onto a given smooth DEM, and run stereo with those. That is documented.

In short, SGM/MGM can crash with difficult images due to memory issues and hopefully some tweaks can make it work better.

zhaomumu233 commented 2 years ago

The satellite images have good similarity, no clouds, black areas and artifacts

My computer has 256G of RAM and 40 threads, this time I used ASP version 3.0 and added the parameters"remove-outliers-by-disparity-params 75.0 2.0" "--threads-multiprocess 40" , "--threads-singleprocess 40" , "corr-memory-limit-mb 4000" But the program still fails, and the terminal reports the following error message:

Error: GdalIO: ASP3-Remove/1-36864_12288_1024_1024/36864_12288_1024_1024-RD.tif: Too many open files (code = 4) GDAL: Failed to open ASP3-Remove/1-RD.tif. Traceback (most recent call last): File "/home/amax/anaconda3/envs/asp3/bin/parallel_stereo", line 943, in normal_run('stereo_fltr', args, msg='%d: Filtering' % step) File "/home/amax/anaconda3/envs/asp3/bin/parallel_stereo", line 580, in normal_run raise Exception('Stereo step ' + kw['msg'] + ' failed') Exception: Stereo step 4: Filtering failed

This is some of the current basic situation, hope to get some of your advice again, thanks ~

oleg-alexandrov commented 2 years ago

Your error says: "Too many open files (code = 4)"

Makes me wonder if indeed too many files are open, as each tile seems to be of size 1024 and your dimensions seem big.

I also wonder what other options you passed to parallel_stereo.

Maybe you can debug this by starting small.

You can do a little clip first. You can run:

stereo_gui --stereo-algorithm asp_mgm --corr-memory-limit-mb 4000 'put here your images and cameras and output prefix'

One should not add here --threads-multiprocess and --threads-single-process as stereo_gui does not support those. This will take a while to load, but it should show you a view of your images. Then you can use Control-Drag-Mouse to select a region in the left and right images, and run parallel_stereo from the Run menu. This should then run a clip. (The doc has more info in the stereo_gui Tools section).

So, I am trying to understand if the algorithm works at all, or if you are hitting resource limits.

Then you can tell me your precise parallel_stereo command you ran which crashed, and the size of your left and right image (those can be found with gdalinfo).

If the tile size ever needs to be bigger, in case you have too many open files, indeed, one can try bigger --job-size-h and --job-size-w.

zhaomumu233 commented 2 years ago

This is the command I pass to parallel_stereo:

parallel_stereo A.tif B.tif ASP3-result/1 --threads-multiprocess 40 --threads-singleprocess 40 -t rpc -s ASP3-SGM-stereo.default

The content in ASP3-SGM-stereo.default is as follows:

PREPROCESSING

alignment-method affineepipolar ip-detect-method 1 individually-normalize

CORRELATION

cost-mode 4 stereo-algorithm 1 corr-kernel 7 7 corr-memory-limit-mb 4000 remove-outliers-by-disparity-params 75.0 2.0

SUBPIXEL REFINEMENT

FILTERING

This is the raw image info I got with gdalinfo:

Files: A.tif Size is 97792, 15784,The size of the image is 2.9G.

Files: B.tif Size is 188416, 35264,The size of the image is 12G.

And I followed the advice you gave and used the gdaltranslate tool to convert the image into an ASP easy to read way.

The size of the A image becomes 1G,The size of the B image becomes 3.8G

Then I followed your advice and used the stereo_GUI tool to select a small area of the two images for the experiment. The program runs successfully, and the result is completely correct from the comparison between the generated DEM and the image.

Compared to normal images, my images are larger, I think this may be the reason for the error. Maybe I can try to set "--job-size-h, --job-size-w" larger, or directly set corr-tile-size larger, to solve the "Too many open files" error

But if I set their parameters larger, the memory consumption will become larger, and the program may crash again due to memory errors. So the crux of the problem may be how to balance the size of a single tile with memory consumption.

Thanks again for your advice and help.

oleg-alexandrov commented 2 years ago

This is good progress. So the issue is tuning things for your machine so that it does not run out of memory.

Maybe you can try:

parallel_stereo --threads-multiprocess 8 --threads-singleprocess 8 --processes 4 --job-size-h 3072 --job-size-w 3072

This way it should use 32 out of your 40 cores (4 processes, 8 threads per process), which is not too bad. Then you can try to monitor memory usage with the "top" command in a different terminal. Later, if it turns out that the memory usage is fine, you can try make --processes 5 or more.

I must say for images your size it would be nice to have maybe 2-4 machines as big as what you have now, and run things in parallel on all of them.

It is interesting that your second image is twice as big as the first, in both width and height.

If you have totally bad luck, and nothing works, your processing job can be divided in two, using --left-image-crop-win (which is what stereo_gui was doing for you under the hood). You can choose to process the top half of your left image (while keeping the full right image), then later the bottom half of your first image (maybe with some overlap of the two), then create two DEMs for these halves and merge them with dem_mosaic.

BTW, we usually prefer --stereo-algorithm 2, which is MGM, which can give a little nicer results than SGM.

In short, you will have to do some experiments and figure out what works for you.

And the thing I suggested, "remove-outliers-by-disparity-params 75.0 2.0" likely did not work for you. That is a rather new option, and if it is not beneficial, or if you notice that some valid disparities get removed too aggressively (say if good areas go missing) this option can be removed or relaxed.

zhaomumu233 commented 2 years ago

I will modify some more parameters as you suggested to find a suitable set of parameters to process my image, hope I can succeed.

Finally, once again sincerely thank you for your continued help.

oleg-alexandrov commented 2 years ago

I got another report about this. It is a GDAL issue, so not ours, but we'd need change how VRTs are created for GDAL to not fail here. Maybe instead of combining a lot of small files at once, when GDAL would fail, they can be divided into some groups, each group combined individually, then all the outputs combined together.

At the very least, ASP should throw an error early on, before doing any work, and suggest the user a bigger tile size so that this does not fail later.

This is high on my list of things to fix, but will likely have to wait a few weeks till more time is allocated for ASP work.

oleg-alexandrov commented 2 years ago

I put a fix to the "too many open files" issue. The fix will be in the 2022-02-09 build at https://github.com/NeoGeographyToolkit/StereoPipeline/releases.

zhaomumu233 commented 2 years ago

Thank you very much for the improvement of the code and your continued help, I will continue to test the code in the follow-up experiments.

ASP is very convenient and efficient, and is a huge help to those working in the field of satellite photogrammetry.

Finally, my sincere thanks again to you and your team.