Add RasterGroupedSum and RasterAverage Endpoints

rajadain commented 8 years ago

Overview

Introduces two new endpoints: RasterGroupedSum which given a polygon, a list of rasters, and a target raster, will return the sum of value of the target raster grouped by the values of the list of rasters, clipped to the polygon. RasterAverage will simply average the value of the target raster clipped to the polygon.

Testing Instructions

Test RasterGroupedSum with:

{
"input": {
  "polygon": [
    "{\"type\":\"MultiPolygon\",\"coordinates\":[[[[-75.1626205444336,39.95580659996906],[-75.25531768798828,39.94514735903112],[-75.22785186767578,39.89446035777916],[-75.1461410522461,39.88761144548104],[-75.09309768676758,39.91078961774283],[-75.09464263916016,39.93817189499188],[-75.12039184570312,39.94435771955196],[-75.1626205444336,39.95580659996906]]]]}"
  ],
  "polygonCRS": "LatLng",
  "targetRaster": "us-ssugro-kfactor-30m-epsg5070",
  "rasters": [
    "nlcd-2011-30m-epsg5070-0.10.0"
  ],
  "rasterCRS": "ConusAlbers",
  "operationType": "RasterGroupedSum",
  "zoom": 0
}
}

The output of Spark JobServer should be:

{
  "classPath": "org.wikiwatershed.mmw.geoprocessing.MapshedJob", 
  "context": "geoprocessing", 
  "duration": "5.126 secs", 
  "jobId": "5c070c2b-1d0c-465a-b81e-a8d7d9495f65", 
  "result": {
      "List(11)": 5.184000246226788, 
      "List(21)": 138.13869930896908, 
      "List(22)": 98.31446076370776, 
      "List(23)": 426.0037625543773, 
      "List(24)": 459.6155584678054, 
      "List(31)": 0.0, 
      "List(41)": 47.09073946066201, 
      "List(43)": 0.0, 
      "List(52)": 0.49600002355873585, 
      "List(71)": 0.0800000037997961, 
      "List(81)": 0.0, 
      "List(82)": 0.0, 
      "List(90)": 18.74480983428657, 
      "List(95)": 0.6880000326782465
  }, 
  "startTime": "2016-05-10T18:50:13.388Z", 
  "status": "FINISHED"
}

Test RasterAverage with:

{
"input": {
  "polygon": [
    "{\"type\":\"MultiPolygon\",\"coordinates\":[[[[-75.1626205444336,39.95580659996906],[-75.25531768798828,39.94514735903112],[-75.22785186767578,39.89446035777916],[-75.1461410522461,39.88761144548104],[-75.09309768676758,39.91078961774283],[-75.09464263916016,39.93817189499188],[-75.12039184570312,39.94435771955196],[-75.1626205444336,39.95580659996906]]]]}"
  ],
  "targetRaster": "us-ssugro-kfactor-30m-epsg5070",
  "rasterCRS": "ConusAlbers",
  "polygonCRS": "LatLng",
  "operationType": "RasterAverage",
  "zoom": 0
}
}

The output of Spark JobServer should be:

{
  "classPath": "org.wikiwatershed.mmw.geoprocessing.MapshedJob", 
  "context": "geoprocessing", 
  "duration": "6.827 secs", 
  "jobId": "bafa10aa-7522-411e-a354-af05c8a95426", 
  "result": {
      "List(0)": 0.01393712694520248
  }, 
  "startTime": "2016-05-10T18:49:51.825Z", 
  "status": "FINISHED"
}

Notes

Does abstracting the distinct pixel counting logic in getDistinctPixels(multiPolygons, extent, rasterExtent) affect performance? I hope one function call per tile would add a very small amount to execution (and maybe even get inlined by the compiler?).

Connects #34 Connects #35

kellyi commented 8 years ago

Checked this out to test an outstanding PR and wanted to know if it was worthwhile bumping the version to 1.2 for these changes? Not sure how closely we're trying to hew to semver.

rajadain commented 8 years ago

The version will be bumped to 1.2 once this is merged.

kellyi commented 8 years ago

👍

lewfish commented 8 years ago

Instead of having a separate RasterAverage operation, I think it should just be an instance of RasterGroupedAverage where rasters is an empty list.

rajadain commented 8 years ago

Also added RasterGroupedAverage. The code to handle grouping vs non-grouping is different enough to warrant specifying different operation names, I think.

This can be tested with

{
  "input": {
    "polygon": [
      "{\"type\":\"MultiPolygon\",\"coordinates\":[[[[-75.1626205444336,39.95580659996906],[-75.25531768798828,39.94514735903112],[-75.22785186767578,39.89446035777916],[-75.1461410522461,39.88761144548104],[-75.09309768676758,39.91078961774283],[-75.09464263916016,39.93817189499188],[-75.12039184570312,39.94435771955196],[-75.1626205444336,39.95580659996906]]]]}"
    ],
    "polygonCRS": "LatLng",
    "targetRaster": "us-ssugro-kfactor-30m-epsg5070",
    "rasters": [
      "nlcd-2011-30m-epsg5070-0.10.0"
    ],
    "rasterCRS": "ConusAlbers",
    "operationType": "RasterGroupedAverage",
    "zoom": 0
  }
}

for which I get a result

{
    "result": {
        "List(11)": 0.0008092413746841691, 
        "List(21)": 0.028911406301584154, 
        "List(22)": 0.01676576752450678, 
        "List(23)": 0.016573442365171852, 
        "List(24)": 0.011148141032012355, 
        "List(31)": 0.0, 
        "List(41)": 0.060527942751493584, 
        "List(43)": 0.0, 
        "List(52)": 0.012717949322018867, 
        "List(71)": 0.0007272727618163282, 
        "List(81)": 0.0, 
        "List(82)": 0.0, 
        "List(90)": 0.03929729525007667, 
        "List(95)": 0.006490566346021193
    }
}

jamesmcclain commented 8 years ago

Okay, I'll check it out.

jamesmcclain commented 8 years ago

Does abstracting the distinct pixel counting logic in getDistinctPixels(multiPolygons, extent, rasterExtent) affect performance? I hope one function call per tile would add a very small amount to execution (and maybe even get inlined by the compiler?).

All of this stuff ends up getting (re)compiled in a profile-guided way -- that is part of what is responsible for the "warm up" effect that we see.

Instead of having a separate RasterAverage operation, I think it should just be an instance of RasterGroupedAverage where rasters is an empty list.

I agree.

Just as a general comment, MapshedJob.scala is starting to get pretty big, with a fair amount of similarity between parts. Aside form Lewis' suggestion, I am not sure what can easily be done to address that right now, but it is something to keep in mind.

If Lewis' suggestion can be easily implemented, I think it should be. Other than that, +1.

rajadain commented 8 years ago

Just did a bunch of refactoring. I'm sure more could be done, but I'm not well-versed enough in Scala to actually do it.

The main changes from the API perspective are as follows:

RasterAverage has been removed, instead now you must send RasterGroupedAverage with an empty list in rasters
RasterJoin has been renamed to RasterGroupedCount

Please update your respective PRs if they use these methods.

jamesmcclain commented 8 years ago

There still seems to be quite a bit of repetition (or at least rhyming) but no realistic ideas for addressing it area springing immediately to mind for me, either.

rajadain commented 8 years ago

I agree. Unfortunately, we need these endpoints for other cards to be unblocked. I'm going to clean these commits and merge shortly after.

jamesmcclain commented 8 years ago

Cool.

WikiWatershed / mmw-geoprocessing