WikiWatershed / mmw-geoprocessing

A Spark Job Server job for Model My Watershed geoprocessing.
Apache License 2.0
6 stars 6 forks source link

MapshedJob Improvements #33

Open rajadain opened 8 years ago

rajadain commented 8 years ago

Avenues for potential improvement, if time / budget allows.

NODATA specification in input JSON

In SummaryJob, we handle NODATA in the Soil raster by converting it to a hard-code value: https://github.com/WikiWatershed/mmw-geoprocessing/blob/develop/summary/src/main/scala/SummaryJob.scala#L105-L108. In this case, since the order of the layers is not known, and different layers may have different preferred values for NODATA, the consumers would like to specify it in the request, perhaps as follows:

{
  "input": {
    "rasters": [
      {
        "id": "nlcd-2011-30m-epsg5070-0.10.0",
        "nodata": 0
      },
      {
        "id": "ssurgo-hydro-groups-30m-epsg5070-0.10.0",
        "nodata": 3
      }
    ],
  }
}

Update the endpoint to be able to take this parameterized value of NODATA.

Support More Rasters

MapShed requires touching 8 raster data sets, although a single field needs at most 3. Currently we have an implementation that can handle at most 3, with the intent that we will make different calls for different combinations.

However, if there are many combinations, and we end up downloading the same set of tiles over and over again (for example, almost all of them use NLCD), then it may be more efficient to do one big join over all the datasets, and then filter out the necessary combinations in Python instead.

If this is the case, we should allow supporting up to 8 rasters.