NeoGeographyToolkit / StereoPipeline

The NASA Ames Stereo Pipeline is a suite of automated geodesy & stereogrammetry tools designed for processing planetary imagery captured from orbiting and landed robotic explorers on other planets.
Apache License 2.0
478 stars 168 forks source link

MGM Correlation fails, complaining GDAL can not write to that tif file #412

Open ShashankBice opened 7 months ago

ShashankBice commented 7 months ago

Describe the bug During the correlation stage of MGM on a stereo pair, the correlation for a few tiles fail with the following error, and the program exits.

2023-10-30 21:50:05 {0} [ console ] : Will refine the disparity using the ASP subpixel-mode: 9.
2023-10-30 21:50:05 {0} [ console ] : Using session: csmmapcsm
2023-10-30 21:50:05 {0} [ console ] : Loading csm cameras used in mapprojection.
2023-10-30 21:50:05 {0} [ console ] : Mapprojected images bundle adjustment prefix: ""
2023-10-30 21:50:05 {0} [ console ] : Left camera file used in mapprojection: csm_aligned2refdem-csm-1050010001B64900.r100.adjusted_state.json
2023-10-30 21:50:05 {0} [ console ] : Right camera file used in mapprojection: csm_aligned2refdem-csm-1050010001B64A00.r100.adjusted_state.json
2023-10-30 21:50:05 {0} [ console ] : Loading camera model: 1050010001B64900_ortho_0.49m.tif csm_aligned2refdem-csm-1050010001B64900.r100.adjusted_state.json
2023-10-30 21:50:05 {0} [ console ] : Loading camera model: 1050010001B64A00_ortho_0.49m.tif csm_aligned2refdem-csm-1050010001B64A00.r100.adjusted_state.json
2023-10-30 21:50:05 {0} [ console ] : Distance between camera centers in meters: 526644.
2023-10-30 21:50:05 {0} [ console ] :   --> Using no pre-processing filter with stereo algorithm: asp_mgm
2023-10-30 21:50:05 {0} [ console ] : 
[ 2023-Oct-30 21:50:05 ] : Stage 1 --> CORRELATION
2023-10-30 21:50:06 {0} [ console ] :   --> Full-res search range based on D_sub: (Origin: (-44, -44) width: 88 height: 88)
2023-10-30 21:50:10 {0} [ console ] :   --------------------------------------------------
2023-10-30 21:50:10 {0} [ console ] :      Kernel size:    Vector2(7,7)
2023-10-30 21:50:10 {0} [ console ] :      Search range:   (Origin: (-44, -44) width: 88 height: 88)
2023-10-30 21:50:10 {0} [ console ] :      Cost mode:      3
2023-10-30 21:50:10 {0} [ console ] :   --------------------------------------------------
2023-10-30 21:50:10 {0} [ console ] : Writing: dem_ortho_0.49m/20151018_0514_1050010001B64900_1050010001B64A00-2048_101376_1024_1024/2048_101376_1024_1024-D.tif
2023-10-30 22:03:25 {0} [ fileio ] : Error: GdalIO: _tiffSeekProc:Cannot send after transport endpoint shutdown (code = 1)
2023-10-30 22:03:25 {0} [ fileio ] : Error: GdalIO: TIFFScanlineSize64:Computed scanline size is zero (code = 1)
2023-10-30 22:03:25 {0} [ fileio ] : Error: GdalIO: dem_ortho_0.49m/20151018_0514_1050010001B64900_1050010001B64A00-2048_101376_1024_1024/2048_101376_1024_1024-D.tif: Bogus block size; unable to allocate a buffer. (code = 1)
2023-10-30 22:03:25 {0} [ fileio ] : Error: GdalIO: TIFFScanlineSize64:Computed scanline size is zero (code = 1)
2023-10-30 22:03:25 {0} [ fileio ] : Error: GdalIO: dem_ortho_0.49m/20151018_0514_1050010001B64900_1050010001B64A00-2048_101376_1024_1024/2048_101376_1024_1024-D.tif: Bogus block size; unable to allocate a buffer. (code = 1)
2023-10-30 22:03:25 {0} [ fileio ] : Error: GdalIO: TIFFScanlineSize64:Computed scanline size is zero (code = 1)
2023-10-30 22:03:25 {0} [ fileio ] : Error: GdalIO: dem_ortho_0.49m/20151018_0514_1050010001B64900_1050010001B64A00-2048_101376_1024_1024/2048_101376_1024_1024-D.tif: Bogus block size; unable to allocate a buffer. (code = 1)
2023-10-30 22:03:25 {0} [ fileio ] : Error: GdalIO: dem_ortho_0.49m/20151018_0514_1050010001B64900_1050010001B64A00-2048_101376_1024_1024/2048_101376_1024_1024-D.tif: FillEmptyTiles() failed because panByteCounts == NULL (code = 1)

Below is the low resolution version of the left image (is rendered when the issue is viewed on github), along with the disparirty_debug output from the D_sub file. image

The computation is being run on a broadwell node, with the following version of ASP:

NASA Ames Stereo Pipeline 3.4.0-alpha
  Build ID: 00eb5eb9
  Build date: 2023-09-22

Happy to provide more info if needed, please let me know :)

Cheers, Shashank

ShashankBice commented 7 months ago

I realize that this could also be due to the very high search ranges at steep terrain, I am trying now with search-range limited to reasonable values. Will report on how that goes.

oleg-alexandrov commented 7 months ago

Does this work with a different testcase?

This looks like a filesystem error. It is failing to write a file to disk.

ShashankBice commented 7 months ago

Yes, I have used the same program with data over the same area in the past week without any issues. I will keep looking into it more and see what I find :slightly_smiling_face:

dshean commented 7 months ago

Hmm. @ShashankBice can you confirm that this is not an issue of filling the disk or exceeding hard quota (preventing additional writes to disk)?

@oleg-alexandrov , FYI, I noticed a change to GeoTIFF driver in GDAL 3.8.0 notes (https://github.com/OSGeo/gdal/blob/master/NEWS.md)

Performance improvement: avoid using block cache when writing whole blocks (up to about twice faster in some scenarios)

oleg-alexandrov commented 7 months ago

That performance improvement should be nice, and hopefully should not break anything. We are at GDAL 3.5.3 now.

ShashankBice commented 7 months ago

I can confirm I am well within quota limits both in terms of disk space and file counts. Looking into it more!

ShashankBice commented 7 months ago

I just reran this one tile alone outside of parallel_stereo, and it completed with the same processing parameters. I am curious how I solve this issue now, as I ran (submitted the job) twice before reporting it here, and some tile or the other failed each time after 10 hours of correlation. I have been monitoring the memory usage and that is not an issue.

I think I will report this to NAS folks to see if they have some advice or if this has been some known issue. Wanted to update here about this discovery.

Cheers, Shashank