centreborelli / s2p

Satellite Stereo Pipeline
GNU Affero General Public License v3.0
208 stars 67 forks source link

Out of Memory for big geotiffs #88

Open kanishk-aidash opened 3 years ago

kanishk-aidash commented 3 years ago

Hi,

After recent fixes, I am trying to run the s2p module on 1-band geotiffs (~50cm resolution). The size of rasters goes approximately 20000 x 20000. The process fails at stereo matching step with following errors:

Screenshot 2021-04-19 at 11 19 22 AM

This is coming from due to system going Out Of Memory.

Screenshot 2021-04-19 at 11 18 58 AM

System Details: Distributor ID: Ubuntu Description: Ubuntu 20.04.2 LTS Release: 20.04 Codename: focal Linux ip-172-31-3-184 5.4.0-1038-aws #40-Ubuntu SMP Fri Feb 5 23:50:40 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Screenshot 2021-04-19 at 11 42 41 AM

I have tried changing time outs .. tile sizes etc , available matching algorithms etc .. all run into same issue ..

Do I need to add more memory? or any other suggested solution that i shall try out ?

mnhrdt commented 3 years ago

s2p should run without problems for input images of arbitrary size, even on a system with small memory. First, try setting the option "max_processes" to 1 in the config file and see if you get the same error.

If this doesn't solve the problem, can you share the config file and the contents of the folder "/.../pair1/" mentioned on your error message?

kanishk-aidash commented 3 years ago

Hi @mnhrdt

Ran with max_processes = 1, still same error.

Screenshot 2021-04-20 at 1 54 52 PM

My config file:

{
    "out_dir":"/Users/kanishkvarshney/Downloads/dsm_data_aidash/output_dir/08745eee-35c1-4bd6-8d93-becbb92e05ed/pair1",
    "images":[
        {
            "img":"/Users/kanishkvarshney/Downloads/dsm_data_aidash/pair1/img1.TIF",
            "rpc":"/Users/kanishkvarshney/Downloads/dsm_data_aidash/pair1/rpc1.XML"
        },
        {
            "img":"/Users/kanishkvarshney/Downloads/dsm_data_aidash/pair1/img2.TIF",
            "rpc":"/Users/kanishkvarshney/Downloads/dsm_data_aidash/pair1/rpc2.XML"
        }
    ],
    "full_img":true,
    "dsm_resolution":0.5,
    "disp_range_method":"sift",
    "tile_size":600,
    "horizontal_margin":20,
    "vertical_margin":5,
    "timeout":7200,
    "clean_intermediate":true,
    "matching_algorithm":"mgm_multi",
    "mgm_timeout":7200,
    "max_processes":1
}

../pair1/ folder contains the 1-band stereo pairs (geotiffs) and corresponding RPC.XMLs

Screenshot 2021-04-20 at 1 53 34 PM Screenshot 2021-04-20 at 1 54 03 PM
mnhrdt commented 3 years ago

Sorry, I meant "pair_1" not "pair1". It's the temporary folder where the particular tiles reside that triggered the error. Its full name appears in the subprocess call to mgm_multi. Something like ./output/tiles/row_XXX/col_XXX/pair_1

This should be a folder with two rectified small tiles of size 600x600 that you can share by zipping the whole directory.

kanishk-aidash commented 3 years ago

@mnhrdt I am uploading zip (and error) for the new run (overrode the output directory with new runs). The flow breaks on random tile for stereo matching with same error.

This one is with 800x800 run. I have been trying different configurations for the run, but all of them break with the same error

Screenshot 2021-04-21 at 6 45 06 PM

pair_1.zip

mnhrdt commented 3 years ago

You may have probably reached a memory limit... I have run this tile on my laptop and it takes almost 4GB of memory at one point. Can you try running the following command inside the "pair_1" folder and see what happens:

/path/to/your/install/of/s2p/bin/mgm_multi -r -109 -R 122 -S 6 -s vfit -t census -O 8 -P1 8.0 -P2 32.0 -confidence_consensusL rectified_disp_confidence.tif rectified_ref.tif rectified_sec.tif rectified_disp.tif

If it fails, you can try closing all other applications on your computer (the browsers may suffice) and then it may work. That would mean that it is indeed an out of memory error that we can try to solve, or at least try to get around.

kanishk-aidash commented 3 years ago

Hey @mnhrdt The process was running on a 8GB, 4 core EC2 ubuntu20.04 machine dedicated to only stereo matching. Nothing else is running on that system. As aforementioned, the tiles on which the SIGABRT happens isn't consistent. The run can go OOM on any of the tiles during Stereo matching step of the algorithm

As a work around, I have triggered the algorithm on a New 32GB system with max_processes = 1, the process has been running for over 12+ hours now and only half the tiles (~600 / 1056) are stereo matched so far ...

gfacciol commented 3 years ago

The disparity range is not being estimated because of lack of sift matches. Could you try running the pipeline by setting the option: cfg['sift_match_thresh'] = 0.8 ?

On Thu, Apr 22, 2021 at 6:35 AM kanishk-aidash @.***> wrote:

Hey @mnhrdt https://github.com/mnhrdt The process is running on a 8GB, 4 core EC2 ubuntu20.04 machine dedicated to only stereo matching. Nothing else is running on that system.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cmla/s2p/issues/88#issuecomment-824531068, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGFI2US4IOFFTKPA5GKBS3TJ6RPNANCNFSM43FEJ4CQ .

carlodef commented 3 years ago

@kanishk-aidash to speed things up while keeping the memory usage as low as possible during the stereo matching step, you can remove the max_processes parameter from the input json file, and replace it with these two parameters:

"max_processes_stereo_matching": 1,
"omp_num_threads": 8,
kanishk-aidash commented 3 years ago
Screenshot 2021-04-25 at 11 46 40 AM

Hey @carlodef It doesn't work .. With your settings it is going Out Of Memory error on a 32GB, 8-core box (this is a new box as aforementioned

Attaching the mgm_multi command pair_1.zip and config.json( Generated via s2p in the same folder)

config_json.txt

pair_1.zip

Screenshot 2021-04-25 at 11 49 17 AM

Memory Usage (hitting 32GB ) :

Screenshot 2021-04-25 at 12 58 27 PM

Only successful run I have had so far is with 'max_processes' = 1 which takes around 20+ hours to run

@gfacciol I have set cfg['sift_match_thresh'] = 0.8 as well, But doesn't seem to help either

gfacciol commented 3 years ago

Well your process peaks at ~35 Gb of RAM [image: image.png]

The reason is that sift failed to find matches and so the disparity range is set to the maximum possible range, which is set as [-700, 217]!

SUBPIX=2 mgm_multi -r -700 -R 217 -S 6 -s vfit -t census -O 8 -P1 8 -P2 32 -confidence_consensusL conf.tif rectified_ref.tif rectified_sec.tif disp.tif

The actual range for this tile is approx [-200, 150]. The black image boundaries are not helping either, because they are processed at all scales trying to find a match.

To increase the probability of finding sift matches in this image (and reduce the range) you should set: cfg['sift_match_thresh'] = 0.8

This should limit the disparity range. We're working on a solution for this problem when no sift matches are found, but it's not integrated.

On Sun, Apr 25, 2021 at 8:27 AM kanishk-aidash @.***> wrote:

[image: Screenshot 2021-04-25 at 11 46 40 AM] https://user-images.githubusercontent.com/77284268/115982928-facae380-a5bb-11eb-9cdc-1f408ccc5719.png

Hey @carlodef https://github.com/carlodef It doesn't work .. With your settings it is going Out Of Memory error on a 32GB, 8-core box (this is a new box as aforementioned

Attaching the mgm_multi command pair_1.zip and config.json( Generated via s2p in the same folder)

config_json.txt https://github.com/cmla/s2p/files/6371369/config_json.txt

pair_1.zip https://github.com/cmla/s2p/files/6371361/pair_1.zip

[image: Screenshot 2021-04-25 at 11 49 17 AM] https://user-images.githubusercontent.com/77284268/115982976-467d8d00-a5bc-11eb-9f9e-148606f689c2.png

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/cmla/s2p/issues/88#issuecomment-826266638, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGFI2UMJX34H6TQLDUB5VTTKOY6PANCNFSM43FEJ4CQ .

carlodef commented 3 years ago

@kanishk-aidash could you please add "use_srtm": true the input config json file? This should help

kanishk-aidash commented 3 years ago

sift_match_thresh

Hey @gfacciol I understand the issue to some extent. High memory consumption is expected for the MGM, but ~35GB is still bit too much. I have tried to run this process with single worker etc.

I have already tried the setting you have suggested cfg['sift_match_thresh'] = 0.8 but to no avail. The last logs you are seeing is with this threshold only.

kanishk-aidash commented 3 years ago

@kanishk-aidash could you please add "use_srtm": true the input config json file? This should help

@carlodef
Nope ... "use_srtm": true doesn't work either. The only thing that works so far is "max_processes" = 1, which takes around 20+ hours.

config_json.txt pair_1.zip

gfacciol commented 3 years ago

I agree that's a lot of memory, we're working on a fix. Meanwhile I have a workaround for your case: It consists in changing the SUBPIX parameter of the correlator from 2 to 1.

env['SUBPIX'] = '2'

Here is the line in question: https://github.com/cmla/s2p/blob/f7540c0723e1613992f0d3aaae01db1b208e6a03/s2p/block_matching.py#L261

this should keep the memory usage within the 32 gb

kanishk-aidash commented 3 years ago

Hey @gfacciol

This fix lets the code run without crashing. Takes around 16 hrs for a 20000x20000 single band stereo pair

In the generated DSM, I am seeing lot's of No Data though. Input geotiff resolution is 0.5. ![Uploading Screenshot 2021-04-28 at 5.00.23 PM.png…]()

gfacciol commented 3 years ago

Hi @kanishk-aidash , you attachment didn't work. But some holes are expected anyways given the density of the matching (which depends on the angle between the views). As a reference, this is the reconstruction on the tile you sent the other day

image

kanishk-aidash commented 3 years ago

@gfacciol Updating the attachment

I expect some missing data, but here the final output sort of looks like salt-n-pepper

Screenshot 2021-04-28 at 7 41 45 PM
gfacciol commented 3 years ago

Ouch, that looks bad, but at this scale is hard to tell, because nans dilate after most subsampling operations. Can you zoom-in on some area to the scale of the resolution, and send it?

On Wed, Apr 28, 2021 at 4:13 PM kanishk-aidash @.***> wrote:

@gfacciol https://github.com/gfacciol Updating the attachment

I expect some missing data, but here the final output sort of looks like salt-n-pepper

[image: Screenshot 2021-04-28 at 7 41 45 PM] https://user-images.githubusercontent.com/77284268/116418449-c70ce980-a859-11eb-852a-7843a15282c2.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cmla/s2p/issues/88#issuecomment-828489647, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGFI2VRU6N5KF3MMSZPH3DTLAJX5ANCNFSM43FEJ4CQ .

kanishk-aidash commented 3 years ago

Hey @gfacciol

Adding two pairs of clips Generated DSM and corresponding Google Tile (clipped from QGIS)

data.zip

kanishk-aidash commented 3 years ago

I agree that's a lot of memory, we're working on a fix. Meanwhile I have a workaround for your case: It consists in changing the SUBPIX parameter of the correlator from 2 to 1.

env['SUBPIX'] = '2'

Here is the line in question: https://github.com/cmla/s2p/blob/f7540c0723e1613992f0d3aaae01db1b208e6a03/s2p/block_matching.py#L261

this should keep the memory usage within the 32 gb

hey @gfacciol
Update... the fix you suggested reduces the memory, but on even bigger rasters (~30000x30000) it still crashes due to out of memory. Memory still shoots up to 32GB

Screenshot 2021-04-30 at 9 26 17 AM
kanishk-aidash commented 3 years ago

Update 2:

  1. Tried using sgbm matching algorithm, but the code crashes(similar behaviour for algorithms other than mgm, and mgm_multi:
Screenshot 2021-04-30 at 12 44 45 PM
  1. Have tried to restrict the disparity range via config, but it doesn't seem to have any impact. Apparently, the config is not getting used properly within the module. Settings tried: cfg['use_srtm'] = True cfg['max_processes_stereo_matching'] = 1 cfg['omp_num_threads'] = 8 cfg['disp_min'] = -100 cfg['disp_min'] = 400 cfg['disp_range_method'] == 'fixed_pixel_range'

Setting 'max_disp_range' field gives the following error, if the disparity range is not l.t. the one provided via the config

Screenshot 2021-05-01 at 12 15 17 AM

Still running into OOM error

Screenshot 2021-04-30 at 9 26 17 AM