USF-IMARS / wv-land-cover

:earth_americas: Processing scripts for decision-tree land use classification on worldview 2 imagery
5 stars 5 forks source link

mosaic all geotiffs #17

Open 7yl4r opened 5 years ago

7yl4r commented 5 years ago

I think it is about time we start talking about how to mosaic all these geotiffs. For now let's will focus on smushing all images into one and we can discuss more complex mosaics later.

user needs/wants

Firstly I want to make sure I understand what users would want given no technical limitations. Here are my starting assumptions; I will update as we discuss further:

  1. data served over a WMS
  2. download files of their regional subset tile. Including
    1. full-resolution arbitrary subset
    2. preview quick-view to determine if they even want the file

...is that really it?

technical considerations

  1. all these images probably won't do well as one big file. We need to mosaic a bunch of images into a set of tiles one tile at a time.
  2. In addition to full-resolution, we probably do want to create downsampled overview images.
  3. we probably want to be sure this gets run on something close to the data files else we may suffer terrible network bottlenecks

solution comparison/discussion

Now I will list resources/options I have found, sorted by my preference, with pro/con (:green_heart:, :broken_heart:) lists.

  1. GDAL virt-dataset then overview-image
  2. gdal_merge
  3. gdal_warp
    • :green_heart: example
    • :green_heart: we're already using gdal
  4. python rasterio merge()
    • :green_heart: docs
  5. QGIS
    • :green_heart: OSS
    • :broken_heart: no CLI(?)
    • :broken_heart: RAM limitations?
  6. ArcGIS
    • :broken_heart: $$$
    • :broken_heart: no CLI(?)
    • :broken_heart: RAM limitations?
7yl4r commented 5 years ago

Heyyyyy neat: creating a virt-dataset then using gdaladdo (1) worked!

Here's the west FL Peninsula rendered w/ nearest-neighbor downsampled by a factor of 2048:

image

Took about 40m:

[root@thing2 west_fl_pen]# time gdaladdo -ro wv2_classif_2019-08-15.vrt 2048
0...10...20...30...40...50...60...70...80...90...100 - done.

real    41m34.946s
user    28m5.781s
sys     1m45.954s
7yl4r commented 5 years ago

gdaladdo gets you to the grayscale geotiff anyway. To get the pseudocolor png above I loaded the file into a QGIS project and then did a "snapshot" w/ the CLI:

tylar@tylardesk:~/tmp$ qgis --noplugins --snapshot test.png --project ./testwv.qgz

I'm not yet sure how to automate the creation of a QGIS project.

7yl4r commented 5 years ago

Perhaps better than using QGIS for this step would be to use rasterio to modify the geotiff example.

Note also that gdal2tiles is the perfect utility for generating rgb png tiles from one large geotiff.

7yl4r commented 5 years ago

Just for fun let's try making a giant mosaic:

[root@thing2 west_fl_pen]# time gdaladdo -ro -r average  wv2_classif_2019-08-15.vrt 1
ERROR 6: The overview file would be larger than 4GB
but this is the largest size a TIFF can be, and BigTIFF is unavailable.
Creation failed.
Overview building failed.

real    0m0.070s
user    0m0.031s
sys     0m0.012s

[root@thing2 west_fl_pen]# time gdaladdo -ro -r average  wv2_classif_2019-08-15.vrt 2
ERROR 6: The overview file would be larger than 4GB
but this is the largest size a TIFF can be, and BigTIFF is unavailable.
Creation failed.
Overview building failed.

real    0m0.039s
user    0m0.028s
sys     0m0.010s

[root@thing2 west_fl_pen]# time gdaladdo -ro -r average  wv2_classif_2019-08-15.vrt 4
0...10...20...30...40...50...60...70...80...90...100 - done.

real    80m3.317s
user    61m59.031s
sys     4m2.996s

[root@thing2 ~]# ls -lh /thing2/imars-objects/west_fl_pen/wv2_classif_2019-08-15.vrt.ovr 
-rw-r--r--. 1 root root 2.8G Aug 16 22:32 /thing2/imars-objects/west_fl_pen/wv2_classif_2019-08-15.vrt.ovr

Neat!

7yl4r commented 5 years ago

Example tile generation:

[root@thing2 west_fl_pen]# time gdal2tiles.py -p raster wv2_classif_2019-08-15.vrt.ovr wv2_classif_tiles
Generating Base Tiles:
0...10...20...30...40...50...60...70...80...90...100 - done.
Generating Overview Tiles:
0...10...20...30...40...50...60...70...80...90...100 - done.

real    8m33.702s
user    6m26.070s
sys 0m30.779s

[root@thing2 west_fl_pen]# du -sh wv2_classif_tiles/
318M    wv2_classif_tiles/

This is all going really surprisingly smoothly. The only hurdle remaining is to enable BigTiff so we can mosaic at 1:1 scale.

7yl4r commented 4 years ago

creating mosaic for circe images:

[root@cozumel imars_objects]# time gdalbuildvrt circe-ingest_tx_2019-10.vrt ./ftp-ingest/circe_ingest/*.tif
0...10...20...30ERROR 4: `./ftp-ingest/circe_ingest/WV02_20120902174937_103001001B8B5700_12SEP02174937-M1BS-500076325020_01_P017_NSF_SWTX_Map_filt_2_Cloudy.tif' not recognised as a supported file format.

Warning 1: Can't open ./ftp-ingest/circe_ingest/WV02_20120902174937_103001001B8B5700_12SEP02174937-M1BS-500076325020_01_P017_NSF_SWTX_Map_filt_2_Cloudy.tif. Skipping it
ERROR 4: `./ftp-ingest/circe_ingest/WV02_20120902174939_103001001B8B5700_12SEP02174939-M1BS-500076325020_01_P019_NSF_SWTX_Map_filt_2_Cloudy.tif' not recognised as a supported file format.

Warning 1: Can't open ./ftp-ingest/circe_ingest/WV02_20120902174939_103001001B8B5700_12SEP02174939-M1BS-500076325020_01_P019_NSF_SWTX_Map_filt_2_Cloudy.tif. Skipping it
...40...50...60...70...80...90...100 - done.

real    0m0.303s
user    0m0.209s
sys     0m0.057s

[root@cozumel imars_objects]# time gdaladdo -ro circe-ingest_tx_2019-10.vrt 1
0...10...20...30...40...50...60...70...80...90...100 - done.

real    7m0.876s
user    6m36.489s
sys     0m14.799s

Resulting file is 16GB file.

A quick glance at it and it looks pretty good in QGIS, although I haven't updated my colormap to match the latest. :+1:

7yl4r commented 4 years ago

Using gdal_merge seems to work as well too, although the resulting file get created at out.tif something else (what?) gets created at circe-ingest_tx_2019-10_merge.tif.

[root@cozumel imars_objects]# time gdal_merge.py -pct -init 0 circe-ingest_tx_2019-10_merge.tif ./ftp-ingest/circe_ingest/*.tif
ERROR 4: `./ftp-ingest/circe_ingest/WV02_20120902174937_103001001B8B5700_12SEP02174937-M1BS-500076325020_01_P017_NSF_SWTX_Map_filt_2_Cloudy.tif' not recognised as a supported file format.

ERROR 4: `./ftp-ingest/circe_ingest/WV02_20120902174939_103001001B8B5700_12SEP02174939-M1BS-500076325020_01_P019_NSF_SWTX_Map_filt_2_Cloudy.tif' not recognised as a supported file format.

0...10...20...30...40...50...60...70...80...90...100 - done.

real    11m51.958s
user    7m5.857s
sys     2m30.509s
7yl4r commented 4 years ago

Doh: my arguments weren't right last time. This is better:

[root@cozumel imars_objects]# time gdal_merge.py -pct -init 0 -of tif -ot Byte -o circe-ingest_tx_2019-10_merge_byte.jpg ./ftp-ingest/circe_ingest/*.tif

Trying to optimize the output image so it isn't too big to load into ArcMap. Raw BigTiff is 18G.

I tried forcing the pixel depth with -ot Byte expecting it might be smaller, but nope.

Tried outputing to png, but got an error:

Format driver png does not support creation and piecewise writing.
Please select a format that does, such as GTiff (the default) or HFA (Erdas Imagine).

Maybe we can lean on that old mainstay imageMagick convert:

[root@cozumel imars_objects]# time convert circe-ingest_tx_2019-10_merge_byte.tif /cozumel/circe-ingest_tx_2019-10_merge_byte.png
convert: Unknown field with tag 33550 (0x830e) encountered. `TIFFReadDirectory' @ warning/tiff.c/TIFFWarnings/820.
convert: Unknown field with tag 33922 (0x8482) encountered. `TIFFReadDirectory' @ warning/tiff.c/TIFFWarnings/820.
convert: Unknown field with tag 34735 (0x87af) encountered. `TIFFReadDirectory' @ warning/tiff.c/TIFFWarnings/820.
convert: Unknown field with tag 34736 (0x87b0) encountered. `TIFFReadDirectory' @ warning/tiff.c/TIFFWarnings/820.
convert: Unknown field with tag 34737 (0x87b1) encountered. `TIFFReadDirectory' @ warning/tiff.c/TIFFWarnings/820.
convert: unable to open pixel cache `/cozumel/circe-ingest_tx_2019-10_merge_byte.png': No space left on device @ error/cache.c/OpenPixelCache/4093.
convert: memory allocation failed `/cozumel/circe-ingest_tx_2019-10_merge_byte.png' @ error/png.c/WriteOnePNGImage/8440.

real    18m12.162s
user    8m4.343s
sys     3m57.412s

...bah. not easily. That command ate up the full ~80GB of HDD space (but not RAM). I tried putting the output file on cozumel's RAID instead, but it still filled the root system disk. It might be possible to figure out where this cache is being stored and mount some more space there... but to do that I need to pin down where imageMagick is putting it's cache.

7yl4r commented 4 years ago

More detail about the image produced by gdal_merge:

[root@cozumel imars_objects]# identify -verbose circe-ingest_tx_2019-10_merge_byte.tif
Image: circe-ingest_tx_2019-10_merge_byte.tif
  Format: TIFF64 (Tagged Image File Format (64-bit))
  Class: DirectClass
  Geometry: 78097x241161+0+0
  Resolution: 72x72
  Print size: 1084.68x3349.46
  Units: PixelsPerInch
  Type: Grayscale
  Base type: Grayscale
  Endianess: MSB
  Colorspace: Gray
  Depth: 8-bit
  Channel depth:
    gray: 8-bit
  Channel statistics:
    Gray:
      min: 0 (0)
      max: 66 (0.258824)
      mean: 13.8799 (0.0544311)
      standard deviation: 24.498 (0.0960704)
      kurtosis: 0.0232111
      skewness: 1.36115
  Colors: 12
  Histogram:
13502391534: (  0,  0,  0) #000000 gray(0,0,0)
 134277303: (  1,  1,  1) #010101 gray(1,1,1)
 179070096: ( 10, 10, 10) #0A0A0A gray(10,10,10)
1052597202: ( 20, 20, 20) #141414 gray(20,20,20)
1033770155: ( 50, 50, 50) #323232 gray(50,50,50)
 677571603: ( 60, 60, 60) #3C3C3C gray(60,60,60)
   1623399: ( 61, 61, 61) #3D3D3D gray(61,61,61)
  35267457: ( 62, 62, 62) #3E3E3E gray(62,62,62)
  64867990: ( 63, 63, 63) #3F3F3F gray(63,63,63)
 241259511: ( 64, 64, 64) #404040 gray(64,64,64)
1861478031: ( 65, 65, 65) #414141 gray(65,65,65)
  49776336: ( 66, 66, 66) #424242 gray(66,66,66)
  Rendering intent: Undefined
  Gamma: 1
  Interlace: None
  Background color: gray(255,255,255)
  Border color: gray(223,223,223)
  Matte color: gray(189,189,189)
  Transparent color: gray(0,0,0)
  Compose: Over
  Page geometry: 78097x241161+0+0
  Dispose: Undefined
  Iterations: 0
  Compression: None
  Orientation: TopLeft
  Properties:
    date:create: 2019-10-28T18:31:50+00:00
    date:modify: 2019-10-28T18:31:50+00:00
    signature: 9f3827f1c6efaceac6ae368bcc03956731c47b31e29ea2c7911a6314d6aab8bd
    tiff:endian: lsb
    tiff:photometric: min-is-black
    tiff:rows-per-strip: 1
  Artifacts:
    filename: circe-ingest_tx_2019-10_merge_byte.tif
    verbose: true
  Tainted: False
  Filesize: 18.838GB
  Number pixels: 18.834G
  Pixels per second: 99.63MB
  User time: 98.630u
  Elapsed time: 3:10.029
  Version: ImageMagick 6.7.8-9 2016-06-16 Q16 http://www.imagemagick.org
identify: Unknown field with tag 33550 (0x830e) encountered. `TIFFReadDirectory' @ warning/tiff.c/TIFFWarnings/820.
identify: Unknown field with tag 33922 (0x8482) encountered. `TIFFReadDirectory' @ warning/tiff.c/TIFFWarnings/820.
identify: Unknown field with tag 34735 (0x87af) encountered. `TIFFReadDirectory' @ warning/tiff.c/TIFFWarnings/820.
identify: Unknown field with tag 34736 (0x87b0) encountered. `TIFFReadDirectory' @ warning/tiff.c/TIFFWarnings/820.
identify: Unknown field with tag 34737 (0x87b1) encountered. `TIFFReadDirectory' @ warning/tiff.c/TIFFWarnings/820.
7yl4r commented 4 years ago

So although this has worked well enough so far, we need to find a way to overlap using the "max" function. I can't see a way to do this using gdal_merge, but I did find this discussion on how to do exactly that. Of the suggested solutions I think using SAGA GIS mosaicking is going to be the best choice. The gdal-only solution there is tempting for simplicity of deployment but it is kludgey. Besides: we might find other uses for SAGA once it is available on our systems.

7yl4r commented 4 years ago

I did some more work on this and found a few things:

7yl4r commented 4 years ago

Accomplished using gdaladdo by manually adding the following to the .vrt file:

  <VRTRasterBand dataType="Byte" band="1" subClass="VRTDerivedRasterBand">
    <PixelFunctionType>maximu</PixelFunctionType>
    <PixelFunctionLanguage>Python</PixelFunctionLanguage>
    <PixelFunctionCode><![CDATA[
import numpy as np
def maximu(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize,
                   raster_ysize, buf_radius, gt, **kwargs):
    np.round_(np.clip(np.max(in_ar, axis = 0),0,255),
              out = out_ar)
]]>
    </PixelFunctionCode>

This is described here. It took <30s on 5 files.

[tylar@tylar-pc gdal_mosaic_reduce]$ export GDAL_VRT_ENABLE_PYTHON=YES
[tylar@tylar-pc gdal_mosaic_reduce]$ time gdaladdo -ro circe_ingest-tx-2019_12-edited.vrt 1
Warning: Overview with subsampling factor of 1 requested. This will copy the full resolution dataset in the overview!
0...10...20...30...40...50...60...70...80...90...100 - done.

real    0m31.613s
user    0m26.524s
sys 0m4.998s
7yl4r commented 4 years ago

solution documented at https://gist.github.com/7yl4r/d03f9617212db5efded1f8a0d34550d3

7yl4r commented 4 years ago

I just tried running gdaladdo for the SE TX region and found that gdaladdo is RAM-limited. My RAM and swap both filled quickly and then the process was killed. We are going to need to try this on a supercomputer. I don't know how to estimate the amount of RAM needed; I am worried it could be in the terabytes.

Also worth noting that gdaladdo seems to be single-threaded; not ideal.

7yl4r commented 4 years ago

I'm now attempting to use gdalwarp. I ran a test using 70 images and now am attempting with all images in SE_TX.

[tylar@tylar-pc CETX]$ gdalwarp images/WV*tif -r max /mnt/texas_central/gdalwarp_output.tif
Creating output file that is 327131P x 457567L.
Processing images/WV02_20100101171245_1030010003392800_10JAN01171245-M1BS-500058889180_01_P001_CETX_SOALCHI_filt_3.tif [1/1016] : 0...10...20..
...

A few wv03 images threw an "operation not permitted" error and I moved them from /images/ to /mved/.

7yl4r commented 4 years ago

I just ran gdalwarp for the texas_central region:

[tylar@tylar-pc CETX]$ gdalwarp images/WV*tif -r max /mnt/texas_central/gdalwarp_output.tif

1016 images. It took a few days.

Final file is 140G large. Here's a screenshot:

image

I suspect the tile differences are less profound when calculated using the 'max' function. I thought that was what gdalwarp was doing, but I think the -r argument actually refers to scaling and stretching of images; not overlay.

7yl4r commented 3 years ago

this is not so important now that we have GEE working to mosaic on the fly