Upgrade to GeoTrellis 0.10

rajadain commented 8 years ago

(Checklist taken from the comment below so that it shows up nicely in GitHub)

[x] Make updated copies of these metadata files, while keeping the old ones, on S3. Both metadata files will point to the same layer RDDs
[x] Figure out how to configure Spark Job Server to instantiate Spark Context with additional parameters required for GeoTrellis 0.10.
[x] Finish up and merge in #23
[x] Configure MMW to use the new metadata files here: https://github.com/WikiWatershed/model-my-watershed/blob/develop/src/mmw/mmw/settings/base.py#L370-L371 (Available in https://github.com/WikiWatershed/model-my-watershed/tree/feature/hmc/geotrellis-rc1)

jamesmcclain commented 8 years ago

Here is the metadata update information.

The old metadata: metadatassurgo-hydro-groups-30m-epsg50700.json.txt

The new metadata: metadatassurgo-hydro-groups-30m-new0.json.txt

rajadain commented 8 years ago

Thanks, @jamesmcclain. The high level status is that by making the following changes to the metadata files:

+ header.format
= keyBounds -> keyIndex.properties.keyBounds
+ keyIndex.type
- keyIndex.obj
+ metadata.bounds

we can almost get it to work with new GeoTrellis. The remaining steps are:

Make updated copies of these metadata files, while keeping the old ones, on S3. Both metadata files will point to the same layer RDDs
Figure out how to configure Spark Job Server to instantiate Spark Context with additional parameters required for GeoTrellis 0.10.
Finish up and merge in #23
Configure MMW to use the new metadata files here: https://github.com/WikiWatershed/model-my-watershed/blob/develop/src/mmw/mmw/settings/base.py#L370-L371
Deploy MMW
Delete old metadata files once they are not being used by anything else (or rename and keep around for posterity)

The deployment will work because the setting in MMW points to the name of the metadata file, which in turn points to the path of the RDDs. Since both metadata files point to the same RDDs, we don't need two copies of them.

jamesmcclain commented 8 years ago

Late breaking news: I just got the 0.10-RC1 jar running on spark-jobserver. The SJS configuration must be changed to use the Kryo serialization. An example application.conf is included below:

application.conf.txt

Here is just the diff from the default one that came in the SJS 0.6.1 tree:

@@ -3,7 +3,8 @@
   master = "local[4]"
   # spark web UI port
   webUrlPort = 8080
-
+  serializer = org.apache.spark.serializer.KryoSerializer
+  
   jobserver {
     port = 8090
     bind-address = "0.0.0.0"
@@ -149,7 +150,7 @@
   default-host-header = "spray.io:8765"

   # Increase this in order to upload bigger job jars
-  parsing.max-content-length = 30m
+  parsing.max-content-length = 300m
 }

 shiro {

Here is the request.json that I used: request.json.txt

Here are the results when I use it:

{
  "result": [{
    "(71,3)": 865,
    "(90,3)": 1650,
    "(22,7)": 5,
    "(11,1)": 104,
    "(11,6)": 10,
    "(52,3)": 7051,
    "(22,3)": 7,
    "(22,1)": 6,
    "(11,7)": 276,
    "(90,2)": 644,
    "(21,3)": 89,
    "(11,3)": 10998,
    "(31,7)": 208,
    "(52,6)": 40,
    "(52,2)": 1213,
    "(90,1)": 1901,
    "(43,4)": 40799,
    "(11,2)": 81,
    "(21,7)": 235,
    "(42,6)": 63,
    "(95,3)": 144,
    "(71,7)": 242,
    "(90,7)": 7473,
    "(71,1)": 207,
    "(95,2)": 8,
    "(52,7)": 5897,
    "(41,1)": 7486,
    "(22,4)": 13,
    "(41,7)": 10741,
    "(21,4)": 434,
    "(31,4)": 254,
    "(42,7)": 12392,
    "(43,3)": 21667,
    "(71,6)": 17,
    "(41,2)": 1826,
    "(52,1)": 2476,
    "(90,6)": 2324,
    "(21,2)": 19,
    "(95,7)": 471,
    "(42,4)": 30492,
    "(21,1)": 297,
    "(71,2)": 198,
    "(41,6)": 245,
    "(21,6)": 65,
    "(42,1)": 4790,
    "(43,6)": 380,
    "(43,7)": 19607,
    "(41,4)": 32622,
    "(52,4)": 9290,
    "(41,3)": 24396,
    "(71,4)": 797,
    "(42,3)": 4531,
    "(95,6)": 112,
    "(95,1)": 68,
    "(95,4)": 731,
    "(43,1)": 13473,
    "(90,4)": 11363,
    "(42,2)": 1962,
    "(11,4)": 392,
    "(31,2)": 93,
    "(31,3)": 100,
    "(31,1)": 96,
    "(43,2)": 3737
  }]
}

jamesmcclain commented 8 years ago

I should note that the above is not a "production ready" solution by any stretch of the imagination.

I imagine that we should figure out a way to make sure (or confirm) that our special kryo registrations are done (because if they are not done it is horribly inefficient).
That is not the correct way to pass configuration to SJS (I think there is a way to supply a supplementary file and point to it).

I am tagging a few other people who might be interested in this discussion: @hectcastro @lossyrob @echeipesh @moradology @pomadchin

hectcastro commented 8 years ago

That is not the correct way to pass configuration to SJS (I think there is a way to supply a supplementary file and point to it).

I think we may have already had settings in place to override the default serializer:

spark {
  home = "/opt/spark"
  master = "local[*]"

  context-settings.passthrough.spark.serializer = "org.apache.spark.serializer.KryoSerializer"
  context-settings.passthrough.spark.kryo.registrator = "geotrellis.spark.io.hadoop.KryoRegistrator"
}

I can test if we have a mechanism to confirm that it's being set properly.

jamesmcclain commented 8 years ago

Aside from the fact that I think the kryo registrator class is now called "geotrellis.spark.io.kryo.KryoRegistrator", that configuration looks like it takes care of both of the concerns that I mentioned above.

rajadain commented 8 years ago

@jamesmcclain So do you think you have enough information now to finish up #23?

jamesmcclain commented 8 years ago

As far as I know, the mmw-geoprocessing side of it is finished. Besides the deployment tweaks mentioned above, everything should work (as far as I know).

I am essentially just waiting for a +1 on #23.

rajadain commented 8 years ago

Splendid work. I think we can go ahead and close this card. I've created https://github.com/WikiWatershed/model-my-watershed/issues/1207 to track work to be done on the main app side.

WikiWatershed / mmw-geoprocessing

Upgrade to GeoTrellis 0.10 #22