WikiWatershed / mmw-geoprocessing

A Spark Job Server job for Model My Watershed geoprocessing.
Apache License 2.0
6 stars 6 forks source link

Upgrade to GeoTrellis 0.10 #22

Closed rajadain closed 8 years ago

rajadain commented 8 years ago

(Checklist taken from the comment below so that it shows up nicely in GitHub)

jamesmcclain commented 8 years ago

Here is the metadata update information.

The old metadata: metadatassurgo-hydro-groups-30m-epsg50700.json.txt

The new metadata: metadatassurgo-hydro-groups-30m-new0.json.txt

rajadain commented 8 years ago

Thanks, @jamesmcclain. The high level status is that by making the following changes to the metadata files:

+ header.format
= keyBounds -> keyIndex.properties.keyBounds
+ keyIndex.type
- keyIndex.obj
+ metadata.bounds

we can almost get it to work with new GeoTrellis. The remaining steps are:

The deployment will work because the setting in MMW points to the name of the metadata file, which in turn points to the path of the RDDs. Since both metadata files point to the same RDDs, we don't need two copies of them.

jamesmcclain commented 8 years ago

Late breaking news: I just got the 0.10-RC1 jar running on spark-jobserver. The SJS configuration must be changed to use the Kryo serialization. An example application.conf is included below:

application.conf.txt

Here is just the diff from the default one that came in the SJS 0.6.1 tree:

@@ -3,7 +3,8 @@
   master = "local[4]"
   # spark web UI port
   webUrlPort = 8080
-
+  serializer = org.apache.spark.serializer.KryoSerializer
+  
   jobserver {
     port = 8090
     bind-address = "0.0.0.0"
@@ -149,7 +150,7 @@
   default-host-header = "spray.io:8765"

   # Increase this in order to upload bigger job jars
-  parsing.max-content-length = 30m
+  parsing.max-content-length = 300m
 }

 shiro {

Here is the request.json that I used: request.json.txt

Here are the results when I use it:

{
  "result": [{
    "(71,3)": 865,
    "(90,3)": 1650,
    "(22,7)": 5,
    "(11,1)": 104,
    "(11,6)": 10,
    "(52,3)": 7051,
    "(22,3)": 7,
    "(22,1)": 6,
    "(11,7)": 276,
    "(90,2)": 644,
    "(21,3)": 89,
    "(11,3)": 10998,
    "(31,7)": 208,
    "(52,6)": 40,
    "(52,2)": 1213,
    "(90,1)": 1901,
    "(43,4)": 40799,
    "(11,2)": 81,
    "(21,7)": 235,
    "(42,6)": 63,
    "(95,3)": 144,
    "(71,7)": 242,
    "(90,7)": 7473,
    "(71,1)": 207,
    "(95,2)": 8,
    "(52,7)": 5897,
    "(41,1)": 7486,
    "(22,4)": 13,
    "(41,7)": 10741,
    "(21,4)": 434,
    "(31,4)": 254,
    "(42,7)": 12392,
    "(43,3)": 21667,
    "(71,6)": 17,
    "(41,2)": 1826,
    "(52,1)": 2476,
    "(90,6)": 2324,
    "(21,2)": 19,
    "(95,7)": 471,
    "(42,4)": 30492,
    "(21,1)": 297,
    "(71,2)": 198,
    "(41,6)": 245,
    "(21,6)": 65,
    "(42,1)": 4790,
    "(43,6)": 380,
    "(43,7)": 19607,
    "(41,4)": 32622,
    "(52,4)": 9290,
    "(41,3)": 24396,
    "(71,4)": 797,
    "(42,3)": 4531,
    "(95,6)": 112,
    "(95,1)": 68,
    "(95,4)": 731,
    "(43,1)": 13473,
    "(90,4)": 11363,
    "(42,2)": 1962,
    "(11,4)": 392,
    "(31,2)": 93,
    "(31,3)": 100,
    "(31,1)": 96,
    "(43,2)": 3737
  }]
}
jamesmcclain commented 8 years ago

I should note that the above is not a "production ready" solution by any stretch of the imagination.

I am tagging a few other people who might be interested in this discussion: @hectcastro @lossyrob @echeipesh @moradology @pomadchin

hectcastro commented 8 years ago

That is not the correct way to pass configuration to SJS (I think there is a way to supply a supplementary file and point to it).

I think we may have already had settings in place to override the default serializer:

spark {
  home = "/opt/spark"
  master = "local[*]"

  context-settings.passthrough.spark.serializer = "org.apache.spark.serializer.KryoSerializer"
  context-settings.passthrough.spark.kryo.registrator = "geotrellis.spark.io.hadoop.KryoRegistrator"
}

I can test if we have a mechanism to confirm that it's being set properly.

jamesmcclain commented 8 years ago

Aside from the fact that I think the kryo registrator class is now called "geotrellis.spark.io.kryo.KryoRegistrator", that configuration looks like it takes care of both of the concerns that I mentioned above.

rajadain commented 8 years ago

@jamesmcclain So do you think you have enough information now to finish up #23?

jamesmcclain commented 8 years ago

As far as I know, the mmw-geoprocessing side of it is finished. Besides the deployment tweaks mentioned above, everything should work (as far as I know).

I am essentially just waiting for a +1 on #23.

rajadain commented 8 years ago

Splendid work. I think we can go ahead and close this card. I've created https://github.com/WikiWatershed/model-my-watershed/issues/1207 to track work to be done on the main app side.