Closed kellyi closed 6 years ago
c962ce2 reverts changing the input data format so we can keep using jars in the existing MMW collections api branch for testing.
650fdac splits out some of the likelier common setup ops into separate helper functions with doc strings
Also updated the actual rasterGroupedCount
fn to just return a Map[String, Int]
so that the interface function only has to handle serializing it.
140c6a3 makes some more simplifications -- can push all these formatters into a separate trait.
1508a96 splits the JobUtils
trait into Geoprocessing
and Utils
and also puts the S3 bucket id in the config file.
Compared the output from previously saved RasterGroupedCount
operations. They are all identical, except for this one: nlcd-soils-request-huc10.json.txt. This succeeds in the old geoprocessing service, but fails in this one with this failure:
Going to experiment a bit to see if I can pinpoint the issue, but it is likely a combination of using the
us-ssurgo-texture-id-30m-epsg5070
layer and the new MultiPolygon intersection.
Here's what the shape looks like:
Created https://github.com/WikiWatershed/model-my-watershed/issues/2153 for the bug mentioned above. That is out of scope for this work, and should be handled separately. Going to take another look at the code.
This import is not used and should be removed:
diff --git a/api/src/main/scala/WebServer.scala b/api/src/main/scala/WebServer.scala
index 432fb5a..28d783c 100644
--- a/api/src/main/scala/WebServer.scala
+++ b/api/src/main/scala/WebServer.scala
@@ -4,7 +4,6 @@ import akka.http.scaladsl.unmarshalling.Unmarshaller._
import akka.http.scaladsl.server.{ HttpApp, Route }
import akka.http.scaladsl.marshallers.sprayjson.SprayJsonSupport._
import spray.json._
-import DefaultJsonProtocol._
import com.typesafe.config.ConfigFactory
import com.typesafe.scalalogging.LazyLogging
I just did a quick test to see if Vector
was faster than using a List
or Seq
:
diff --git a/api/src/main/scala/Geoprocessing.scala b/api/src/main/scala/Geoprocessing.scala
index 2a0a40c..c193386 100644
--- a/api/src/main/scala/Geoprocessing.scala
+++ b/api/src/main/scala/Geoprocessing.scala
@@ -35,7 +35,7 @@ trait Geoprocessing extends Utils {
* @return A Map of cell counts
*/
private def rasterGroupedCount(
- rasterLayers: Seq[TileLayerCollection[SpatialKey]],
+ rasterLayers: Vector[TileLayerCollection[SpatialKey]],
multiPolygon: MultiPolygon
): Map[String, Int] = {
val init = () => new LongAdder
@@ -43,7 +43,7 @@ trait Geoprocessing extends Utils {
// assume all the layouts are the same
val metadata = rasterLayers.head.metadata
- var pixelGroups: TrieMap[List[Int], LongAdder] = TrieMap.empty
+ var pixelGroups: TrieMap[Vector[Int], LongAdder] = TrieMap.empty
joinCollectionLayers(rasterLayers).par
.foreach({ case (key, tiles) =>
@@ -52,7 +52,7 @@ trait Geoprocessing extends Utils {
metadata.layout.tileRows)
Rasterizer.foreachCellByMultiPolygon(multiPolygon, re) { case (col, row) =>
- val pixelGroup: List[Int] = tiles.map(_.get(col, row)).toList
+ val pixelGroup: Vector[Int] = tiles.map(_.get(col, row)).toVector
val acc = pixelGroups.getOrElseUpdate(pixelGroup, init())
update(acc)
}
diff --git a/api/src/main/scala/Utils.scala b/api/src/main/scala/Utils.scala
index 2f7b0a8..03cb3f9 100644
--- a/api/src/main/scala/Utils.scala
+++ b/api/src/main/scala/Utils.scala
@@ -28,7 +28,7 @@ trait Utils {
* @param aoi A MultiPolygon area of interest
* @return [[TileLayerCollection[SpatialKey]]]
*/
- def createRasterLayerIds(rasterIds: List[String], zoom: Int, aoi: MultiPolygon) =
+ def createRasterLayerIds(rasterIds: Vector[String], zoom: Int, aoi: MultiPolygon) =
rasterIds
.map { str => LayerId(str, zoom) }
.map { layer => fetchLayer(layer, aoi)}
@@ -86,12 +86,12 @@ trait Utils {
* @return A map of Tile sequences, keyed with the SpatialKey
*/
def joinCollectionLayers(
- layers: Seq[TileLayerCollection[SpatialKey]]
- ): Map[SpatialKey, Seq[Tile]] = {
- val maps: Seq[Map[SpatialKey, Tile]] = layers.map((_: Seq[(SpatialKey, Tile)]).toMap)
+ layers: Vector[TileLayerCollection[SpatialKey]]
+ ): Map[SpatialKey, Vector[Tile]] = {
+ val maps: Vector[Map[SpatialKey, Tile]] = layers.map((_: Seq[(SpatialKey, Tile)]).toMap)
val keySet: Array[SpatialKey] = maps.map(_.keySet).reduce(_ union _).toArray
for (key: SpatialKey <- keySet) yield {
- val tiles: Seq[Tile] = maps.map(_.apply(key))
+ val tiles: Vector[Tile] = maps.map(_.apply(key))
key -> tiles
}
}.toMap
diff --git a/api/src/main/scala/WebServer.scala b/api/src/main/scala/WebServer.scala
index 432fb5a..e1231ea 100644
--- a/api/src/main/scala/WebServer.scala
+++ b/api/src/main/scala/WebServer.scala
@@ -9,8 +9,8 @@ import DefaultJsonProtocol._
import com.typesafe.config.ConfigFactory
import com.typesafe.scalalogging.LazyLogging
-case class InputData(operationType: String, rasters: List[String], zoom: Int,
- polygonCRS: String, rasterCRS: String, polygon: List[String])
+case class InputData(operationType: String, rasters: Vector[String], zoom: Int,
+ polygonCRS: String, rasterCRS: String, polygon: Vector[String])
case class PostRequestJson(input: InputData)
But the response was almost the same:
# List
for i in 1 2 3 4 5; time -p http --timeout=90 :8090/run < nlcd-soils-request-huc8.json 2>&1 > /dev/null | grep real | awk '{print $2}'; end
9.47
4.97
5.08
4.84
4.92
# Vector
for i in 1 2 3 4 5; time -p http --timeout=90 :8090/run < nlcd-soils-request-huc8.json 2>&1 > /dev/null | grep real | awk '{print $2}'; end
9.27
4.97
4.75
4.17
4.87
So we should keep using what we are. 👍
eae94bd addresses the comments above; 0c2c264 uses the Result
case class to structure the response. Intervening commit is just noise. I'll squash this down to one commit when it's ready to go in.
I checked that the response is still the same and it is.
Thanks for all your help with this! I dropped the superfluous toJson
call, then squashed that change into the commit. Going to merge this in when the tests pass.
Overview
This PR adds a JobUtils trait including a method to perform the RasterGroupedCount operation from a given input JSON, along with the necessary helper methods and a case class to destructure the input JSON.
We decided to modify the input shape: previously it had a top-level
"input"
key and then all the data beneath it; now we make the other keys top level & drop"input"
. We also changed"polygon"
to"polygons"
to match what's in that value more accurately.Connects #51
Demo
Here's the current response from POSTing a modified version of input offered in #51:
Notes
I roughly started to organize the JobUtils trait by putting definitely public methods --
getRasterGroupedCount
-- first, then moving helper methods to the bottom. May make sense to mark those private.I also declared two type aliases before the
joinCollectionLayers
method to help shorten some lines & reduce long declarations. Happy to change this back if we'd like.Testing Instructions
./scripts/server
...and verify that the response matches the test output in #51.