WikiWatershed / mmw-geoprocessing

A Spark Job Server job for Model My Watershed geoprocessing.
Apache License 2.0
6 stars 6 forks source link

AWS 2-3: Add New Operation for RasterGroupedCount on COGs #113

Open rajadain opened 2 months ago

rajadain commented 2 months ago

Following up to #112, add a new operation that is similar to the RasterGroupedCount operation but works on COGs.

rajadain commented 1 month ago

Here's some resources that should help:

https://github.com/geotrellis/geotrellis-server/blob/main/stac-example/src/main/scala/geotrellis/server/ogc/stac/StacOgcRepositories.scala#L59-L94

https://github.com/azavea/nasa-hyperspectral/blob/3dd517dd8fd2013a114b3734b4afb34706a85a3c/src/pipeline/cog-clip/src/main/scala/com/azavea/nasa/hsi/commands/CogClip.scala#L128

https://github.com/geotrellis/geotrellis-server/tree/main/stac-example

We may need to copy over or otherwise import these STAC raster sources for them to work with GeoTrellis:

https://github.com/geotrellis/geotrellis-server/tree/main/stac/src/main/scala/geotrellis/stac/raster

These gists by @pomadchin may also be helpful:

https://gist.github.com/pomadchin/86e61dc151fbf2be9fa9f824473071b7

https://gist.github.com/pomadchin/0ecf23c468883aeae58bf68a5e2e5a5e

pomadchin commented 1 month ago

Also a tiny example here to help / isolated demo (summary of all GT Server / Avris links)

That FS2 ~ Futures on edges is a bit of a rough spot; worth considering moving server to cats-effects 3. It should be not too complex given that the server code is not that big.

GT Server artifacts are updated & published up to the most resent GT / stac4s / maml versions.

rajadain commented 1 month ago

I've got some work in this branch, mainly copied from the very handy stac-example-simple above (thank you @pomadchin!), but it's not really working.

I've implemented a sample /stac endpoint, but when I hit it with this request it fails:

time xh --verbose :8090/stac shape="{\"type\":\"MultiPolygon\",\"coordinates\":[[[[-75.220813751220703,39.932512033740309],[-75.150260925292969,39.929221201562726],[-75.120391845703125,39.924350479594167],[-75.124168395996094,39.894723763816785],[-75.192832946777344,39.89643587838578],[-75.229225158691406,39.903810652180944],[-75.224761962890625,39.928957928153828],[-75.220813751220703,39.932512033740309]]]]}"
POST /stac HTTP/1.1
Accept: application/json, */*;q=0.5
Accept-Encoding: gzip, deflate, br
Connection: keep-alive
Content-Length: 388
Content-Type: application/json
Host: localhost:8090
User-Agent: xh/0.22.0

{
    "shape": "{\"type\":\"MultiPolygon\",\"coordinates\":[[[[-75.220813751220703,39.932512033740309],[-75.150260925292969,39.929221201562726],[-75.120391845703125,39.924350479594167],[-75.124168395996094,39.894723763816785],[-75.192832946777344,39.89643587838578],[-75.229225158691406,39.903810652180944],[-75.224761962890625,39.928957928153828],[-75.220813751220703,39.932512033740309]]]]}"
}

HTTP/1.1 200 OK
Content-Length: 111
Content-Type: text/plain; charset=UTF-8
Date: Wed, 15 May 2024 17:44:35 GMT
Server: akka-http/10.5.3

no rasters found in extent Extent(-75.2292251586914, 39.894723763816785, -75.12039184570312, 39.93251203374031)
________________________________________________________
Executed in    4.86 secs      fish           external
   usr time   95.58 millis    0.19 millis   95.38 millis
   sys time   27.09 millis    2.32 millis   24.77 millis

The two issues I'm seeing are:

  1. The bbox filter is not working correctly. This is possibly because the shape is not in the right CRS, or I'm not converting a shape into a bbox correctly.
  2. The service is very slow. Corresponding operations in Python feel faster.

Will pair on this tomorrow with @philvarner and @jpolchlo.

rajadain commented 1 month ago

I was able to get a STAC Summary endpoint working in this branch: https://github.com/WikiWatershed/mmw-geoprocessing/tree/tt/113/add-stac-cog-count-endpoint

time xh --verbose http://172.20.0.1:8090/stac shape="{\"type\":\"MultiPolygon\",\"coordinates\":[[[[-75.220813751220703,39.932512033740309],[-75.150260925292969,39.929221201562726],[-75.120391845703125,39.924350479594167],[-75.124168395996094,39.894723763816785],[-75.192832946777344,39.89643587838578],[-75.229225158691406,39.903810652180944],[-75.224761962890625,39.928957928153828],[-75.220813751220703,39.932512033740309]]]]}"
POST /stac HTTP/1.1
Accept: application/json, */*;q=0.5
Accept-Encoding: gzip, deflate, br
Connection: keep-alive
Content-Length: 388
Content-Type: application/json
Host: 172.20.0.1:8090
User-Agent: xh/0.18.0

{
    "shape": "{\"type\":\"MultiPolygon\",\"coordinates\":[[[[-75.220813751220703,39.932512033740309],[-75.150260925292969,39.929221201562726],[-75.120391845703125,39.924350479594167],[-75.124168395996094,39.894723763816785],[-75.192832946777344,39.89643587838578],[-75.229225158691406,39.903810652180944],[-75.224761962890625,39.928957928153828],[-75.220813751220703,39.932512033740309]]]]}"
}

HTTP/1.1 200 OK
Content-Length: 80
Content-Type: application/json
Date: Fri, 17 May 2024 03:04:00 GMT
Server: akka-http/10.5.3

{
    "List(2)": 7532,
    "List(8)": 2505,
    "List(11)": 5150,
    "List(7)": 331794,
    "List(1)": 46323
}
________________________________________________________
Executed in    4.55 secs      fish           external
   usr time    2.64 millis  174.00 micros    2.46 millis
   sys time    0.09 millis   87.00 micros    0.00 millis

I think this is cropping the raster to the extent of the AoI correctly, but not the AoI itself. Need to check that next.

rajadain commented 1 month ago

I made updated notebooks with simple shapes to set as the baseline correct numbers. Will compare the output of the geoprocessing service to this.

https://gist.github.com/rajadain/aa7a569fea3f316a0c9c03c846291d17

pomadchin commented 1 month ago

Btw trying it with GDAL on could be a nice experiment, may give some performance benefits (tho carries some hidden GDAL pitfalls along the way).

I’ll cut an updated GT release soon that properly supports GDAL 3.7+

rajadain commented 1 month ago

Anecdotally, running it on my Windows Core i7 was faster than macOS M1 Pro. Possibly an x86 vs ARM issue? Final deployment will be on x86 so should be fine 🤞

pomadchin commented 1 month ago

Could be a JDK issue, what Java version is on M1 vs x86? I’d recommend going to 21 if possible.

rajadain commented 1 month ago

That was a good tip. M1 was on JDK 18, installed 21 and it is markedly faster.

rajadain commented 1 month ago

Alright, I've got a pretty good implementation in my branch https://github.com/WikiWatershed/mmw-geoprocessing/tree/tt/113/add-stac-cog-count-endpoint. It follows the recipe in Python / Jupyter here: https://gist.github.com/rajadain/1d3591a7a00c750466e3793bf1a9bcd2#file-lower-schuylkill-huc-10-test-2-ipynb.

The order of operations in Jupyter is:

  1. Find tiffs in the STAC API that intersect the AoI
  2. For each tiff, clip to AoI extent and reproject to EPSG:5070
    • (really to the correct Albers projection based on where the AoI is, but for simplicity I'm using ConusAlbers EPSG:5070 here)
  3. For each reprojected clip, mask to the AoI
  4. Merge all masked clips
  5. Calculate histogram using np.unique(data, return_counts=True)

This generates an output for the Lower Schuylkill HUC-10 like:

{
    "List(1)": 87250,
    "List(11)": 308877,
    "List(2)": 639920,
    "List(4)": 283,
    "List(5)": 173754,
    "List(7)": 2855276,
    "List(8)": 3126
}

The order of operations in GeoTrellis is:

  1. Find tiffs in the STAC API that intersect with the AoI
  2. Create a MosaicRasterSource from those tiffs, reprojected to EPSG:5070
  3. Clip the RasterSource to the AoI's extent
  4. Mask the Raster to the AoI
  5. Calculate histogram using histogram.binCounts

This results in numbers that are quite close, but not exact:

{
    "List(1)": 105463,
    "List(11)": 390845,
    "List(2)": 730907,
    "List(4)": 525,
    "List(5)": 169751,
    "List(7)": 3591677,
    "List(8)": 5066
}

Here's how they compare:

image

image

Notably, we use the relative distribution in MMW.

Here's another example with Conococheague-Opequon HUC-8:

image

image

I'm going to put up a PR for this and request feedback from the multiple folks that have assisted so far 🙏