WikiWatershed / mmw-geoprocessing

A Spark Job Server job for Model My Watershed geoprocessing.
Apache License 2.0
6 stars 6 forks source link

Less Specific Endpoints #32

Closed rajadain closed 8 years ago

rajadain commented 8 years ago

Overview

In order to handle MapShed requirements, we need to support the following use cases:

These commits facilitate these requirements. Sample requests have been added and updated where necessary.

Testing Instructions

Try out the sample requests, ensure that the results are as expected.

Notes

Currently there is a very large overlap between rasterLinesJoin and rasterJoin. Is there a way we can reduce the redundancy there?

In SummaryJob, we handle NODATA in the Soil raster by converting it to a hard-code value: https://github.com/WikiWatershed/mmw-geoprocessing/blob/develop/summary/src/main/scala/SummaryJob.scala#L105-L108. In this case, since the order of the layers is not known, and different layers may have different preferred values for NODATA, the consumers would like to specify it in the request, perhaps as follows:

{
  "input": {
    "rasters": [
      {
        "id": "nlcd-2011-30m-epsg5070-0.10.0",
        "nodata": 0
      },
      {
        "id": "ssurgo-hydro-groups-30m-epsg5070-0.10.0",
        "nodata": 3
      }
    ],
  }
}

But since in the code we are operating on a joined set of rasters, I'm not sure how we would use this value.

Connects #30 Connects #28

jamesmcclain commented 8 years ago

Okay, I will take a look by the end of the day.

jamesmcclain commented 8 years ago

+1

Just a couple of comments on the notes:

Currently there is a very large overlap between rasterLinesJoin and rasterJoin. Is there a way we can reduce the redundancy there?

It looks like one is concerned with MultiLines and the other with MultiPolygons which are both types of geometry. It should be possible to write one function that handles both cases (most probably by using match/case).

In this case, since the order of the layers is not known, and different layers may have different preferred values for NODATA, the consumers would like to specify it in the request

That sounds like a fine plan.

Before I forget, I just want to mention that with things written as they are, all raster layers passed in must be integer layers. The reason is that the get method on the Tile type returns an integer whereas the getDouble method returns a double. In all probability, this means that if you want to support scenarios in which even one layer as floating-point values, you must treat all layers as if they do.

jamesmcclain commented 8 years ago

I would like to try to prepare a pull request against this branch today. If I am not able to do so by the end of the day, feel free to merge.

rajadain commented 8 years ago

Given the updates to rasterLinesJoin in your work @jamesmcclain, should I also update rasterJoin similarly?

jamesmcclain commented 8 years ago

Given the updates to rasterLinesJoin in your work @jamesmcclain, should I also update rasterJoin similarly?

I don't think that rasterJoin suffers from the same problem as rasterLinesJoin. The main issue was serialization and deserialization of large number of lines in the latter, but in the former there are always a relatively modest number of polygons (I think -- correct me if I am wrong).

rajadain commented 8 years ago

but in the former there are always a relatively modest number of polygons

Ah that's correct, now I understand what your PR sped up. Will give it one last pass, and then merge. Thanks!

rajadain commented 8 years ago

Thanks for all your assistance! I'm merging this in and creating a new release.

jamesmcclain commented 8 years ago

Cool, happy to help!