WikiWatershed / mmw-geoprocessing

A Spark Job Server job for Model My Watershed geoprocessing.
Apache License 2.0
6 stars 6 forks source link

Collections API: Implement RasterGroupedAverage #52

Closed rajadain closed 6 years ago

rajadain commented 6 years ago

Add support for the RasterGroupedAverage operation.

Sample input: nlcd-kfactor-request.json.txt

Expected output: nlcd-kfactor-response.json.txt

Depends on #51

mmcfarland commented 6 years ago

Include optimizations made via https://github.com/WikiWatershed/mmw-geoprocessing/issues/47

rajadain commented 6 years ago

Here's an example of RasterGroupedAverage without any rasters to group by:

awc-request.json.txt

awc-response.json.txt

kellyi commented 6 years ago

65 is mostly ready for a look, I think, except that I'm getting some slightly output values for the original nlcd-kfactor-request input. Here's what the implementation in #65 currently returns:

{
    "result": {
        "List(42)": 0.22445346552610787,
        "List(22)": 0.20668663718780375,
        "List(43)": 0.23152918045304285,
        "List(71)": 0.26530274008653,
        "List(41)": 0.2639788878915292,
        "List(21)": 0.23288021817461474,
        "List(24)": 0.17446897070338138,
        "List(31)": 0.2255980250794072,
        "List(90)": 0.2502959564896539,
        "List(52)": 0.2708109404806156,
        "List(11)": 0.17810512682757315,
        "List(23)": 0.1956562087269768,
        "List(82)": 0.2680885086530039,
        "List(81)": 0.2662300594180988,
        "List(95)": 0.30244740773923695
    }
}

The values for the RasterGroupedAverage op with no rasters to group by are the same:

{
    "result": {
        "List(0)": 9.937211446569115
    }
}
kellyi commented 6 years ago

Wonder if the discrepancy above is related to the misalignment issue? @rajadain do you have any other sample inputs to test for the rasterGroupedAverage op with rasters supplied to test?

rajadain commented 6 years ago

The sample input provided above does use misaligned layers, since NLCD and KFactor belong to different generations of rasters https://github.com/WikiWatershed/model-my-watershed/issues/2153. There should only be a difference if there is a tile that is not being selected. I'll try and generate a screenshot using the shape and layers in the input to see if we're missing tiles.

rajadain commented 6 years ago

Hmm, so while there is some misalignment, we are still selecting the correct tiles:

image

So the numbers should be identical.

kellyi commented 6 years ago

Thanks for checking that! Something's probably off in the implementation then.

kellyi commented 6 years ago

I'm getting some slightly output values for the original nlcd-kfactor-request input

Turns out this happens when using a list to store the values, then doing list.sum / list.length for the average. Using an (accumulator: DoubleAdder, counter: LongAdder) then doing accumulator / counter instead brings them back into alignment.