Collections API: Use Futures

rajadain commented 6 years ago

Overview

By using Futures instead of immediate values we can fetch tiles for multiple layers in parallel, thus speeding up the entire operation as the IO is the most expensive part.

Tagging @lossyrob for code review.

Connects #67

Demo

Here are some recorded run times when running a RasterGroupedCount operation for HUC-08 in isolation:

Runs	nlcd-soils-request-huc8.json	Future	Natural
	1	6.76	8.25
	2	3.11	4.15
	3	2.77	4.92
	4	2.52	3.66
	5	3.22	3.73
Run 1	Average	3.676	4.942
	1	2.75	3.74
	2	2.41	3.69
	3	2.8	3.62
	4	2.33	3.61
	5	3.67	4.08
Run 2	Average	2.792	3.748
	1	2.42	4.19
	2	2.23	3.31
	3	2.48	3.45
	4	2.28	3.59
	5	2.82	3.62
Run 3	Average	2.446	3.632

	Total Average	2.971333333	4.107333333

	Speed up	1.382319946

So using the Futures is clearly faster. The total speed is slower when servicing multiple requests though:

Notes

I tested with nlcd-soils-request-huc8.json and with nlcd-streams-request.json and got correct results. Running many of them in parallel took longer, but everything came through in the end:

Note that these times are from my local, and not within the VM, which is likely to be a little slower.

I could not reproduce the original issue in #67, wherein a single request "blocked" the machine from accepting others. If there are any recommendations for reproducing it, I can try them, otherwise I think we should proceed with what we have and make a new card if that behavior returns.

Testing Instructions

Check out this branch and run server
Run one of each type of operation and ensure that the values are the same as before

kellyi commented 6 years ago

About to take a look at this.

rajadain commented 6 years ago

For reference here is nlcd-soils-request-huc8.json (RasterGroupedCount operation on a ~3500 sq km shape) without Futures:

and with:

Here is nlcd-streams-request.json (RasterLinesJoin operation on a few 100 sq km) without Futures:

and with:

These are the files I'm testing with:

pr70-futures-sample-requests.zip

Most clearly, the initial heavy load time of fetching tiles is significantly reduced, which is great for cold requests, the kind we're likely to have most often.

lossyrob commented 6 years ago

After talking with some of the GT team, I need to learn a bit more about how to properly use Execution Contexts in this setting (based on some painful work on the Raster Foundry team around their tiler). I'll give a report here when I sort things out.

kellyi commented 6 years ago

Ran this script with the nlcd-soils-request-huc8 simultaneously in 4 tmux panes:

#!/usr/bin/env ruby

input = ARGV[0].to_s

10.times do
  system("/usr/bin/time -p http --timeout=90 :8090/run < #{input} 2>&1 > /dev/null | grep real")
end

Here are the results from develop:

real 10.24
real 9.81
real 9.35
real 8.86
real 9.33
real 8.33
real 7.16
real 7.15
real 6.65
real 7.63

real 10.27
real 8.37
real 10.22
real 9.24
real 8.82
real 7.56
real 7.33
real 6.81
real 7.68
real 7.11

real 12.52
real 8.24
real 9.81
real 9.41
real 8.58
real 8.31
real 6.99
real 6.75
real 7.96
real 7.10

real 9.49
real 8.63
real 9.45
real 9.09
real 8.07
real 8.77
real 7.49
real 6.80
real 6.99
real 7.46

And here's this branch:

real 8.19
real 6.71
real 5.90
real 9.33
real 8.44
real 6.71
real 9.91
real 9.21
real 5.47
real 8.65

real 16.66
real 6.21
real 10.21
real 9.52
real 8.60
real 5.70
real 8.39
real 6.16
real 6.93
real 5.37

real 4.15
real 4.65
real 11.63
real 5.53
real 9.75
real 10.07
real 11.92
real 5.86
real 6.55
real 9.93

real 6.59
real 7.30
real 12.84
real 7.33
real 12.81
real 12.39
real 10.55
real 7.07
real 4.39
real 3.76

Seems like this branch is generally faster. The VisualVM output was also pretty interesting -- let me see if I can make a gif of it.

kellyi commented 6 years ago

Not fully sure how to parse these but this is develop, without the futures, where the work seems to happen in the dispatchers (green represents running, brown is "parked", yellow is "waiting"):

This is this branch, where things seem to get forked off:

&

kellyi commented 6 years ago

+1 from me.

One thing this doesn't include is the "blocking-dispatcher" suggestion made here http://doc.akka.io/docs/akka-http/10.0.9/scala/http/handling-blocking-operations-in-akka-http-routes.html

I think we could probably delay thinking about those changes until we see how this performs without them?

rajadain commented 6 years ago

I tried the blocking dispatcher stuff like this:

diff --git a/api/src/main/resources/application.conf b/api/src/main/resources/application.conf
index e3de9fd..b3f93d2 100644
--- a/api/src/main/resources/application.conf
+++ b/api/src/main/resources/application.conf
@@ -1,3 +1,12 @@
+mmw-dispatcher {
+  type = Dispatcher
+  executor = "thread-pool-executor"
+  thread-pool-executor {
+    fixed-pool-size = 32
+  }
+  throughput = 500
+}
+
 geoprocessing {
     port = 8090
     hostname = "0.0.0.0"
diff --git a/api/src/main/scala/WebServer.scala b/api/src/main/scala/WebServer.scala
index 7a6814c..503e990 100644
--- a/api/src/main/scala/WebServer.scala
+++ b/api/src/main/scala/WebServer.scala
@@ -1,10 +1,10 @@
 package org.wikiwatershed.mmw.geoprocessing

+import akka.dispatch.MessageDispatcher
 import akka.http.scaladsl.unmarshalling.Unmarshaller._
-import akka.http.scaladsl.server.{ HttpApp, Route }
+import akka.http.scaladsl.server.{HttpApp, Route}
 import akka.http.scaladsl.marshallers.sprayjson.SprayJsonSupport._
 import spray.json._
-
 import com.typesafe.config.ConfigFactory
 import com.typesafe.scalalogging.LazyLogging

@@ -42,16 +42,21 @@ object WebServer extends HttpApp with App with LazyLogging with Geoprocessing {
     } ~
     post {
       path("run") {
-        entity(as[PostRequest]) { data =>
-          data.input.operationType match {
-            case "RasterGroupedCount" =>
-              complete(getRasterGroupedCount(data.input))
-            case "RasterGroupedAverage" =>
-              complete(getRasterGroupedAverage(data.input))
-            case "RasterLinesJoin" =>
-              complete(getRasterLinesJoin(data.input))
-            case _ =>
-              throw new Exception(s"Unknown operationType: ${data.input.operationType}")
+        extractActorSystem { system =>
+          implicit val blockingDispatcher:MessageDispatcher =
+            system.dispatchers.lookup("mmw-dispatcher")
+
+          entity(as[PostRequest]) { data =>
+            data.input.operationType match {
+              case "RasterGroupedCount" =>
+                complete(getRasterGroupedCount(data.input))
+              case "RasterGroupedAverage" =>
+                complete(getRasterGroupedAverage(data.input))
+              case "RasterLinesJoin" =>
+                complete(getRasterLinesJoin(data.input))
+              case _ =>
+                throw new Exception(s"Unknown operationType: ${data.input.operationType}")
+            }
           }
         }
       }

and tweaking the configuration with different numbers. Here are the results:

mmw-dispatcher {
  type = Dispatcher
  executor = "thread-pool-executor"
  thread-pool-executor {
    fixed-pool-size = 32
  }
  throughput = 500
}

mmw-dispatcher {
  type = Dispatcher
  executor = "thread-pool-executor"
  thread-pool-executor {
    fixed-pool-size = 32
  }
  throughput = 50
}

mmw-dispatcher {
  type = Dispatcher
  executor = "thread-pool-executor"
  thread-pool-executor {
    fixed-pool-size = 32
  }
  throughput = 100
}

mmw-dispatcher {
  type = Dispatcher
  executor = "thread-pool-executor"
  thread-pool-executor {
    fixed-pool-size = 16
  }
  throughput = 100
}

All the numbers seem worse than the default performance. Tweaking them may be useful in certain contexts, but as we don't have a good enough understanding of the different types of dispatchers available and how their parameters affect runtime, I'd lean towards forgoing this for now.

I will however wait for @lossyrob to comment on the wisdom of using global ExecutionContext.

lossyrob commented 6 years ago

I'm getting caught up in other things, and I don't want to be a blocker. Since this is +1 given that there may be optimizations, perhaps we merge this and write an issue to investigate potential optimizations around EC usage?

rajadain commented 6 years ago

No problem! I created #71 to track that effort. I don't think there's any real rush on it.

@kellyi we should probably create a 3.0.0-beta-2 now that this has been merged, and upgrade MMW to use it.

kellyi commented 6 years ago

we should probably create a 3.0.0-beta-2 now that this has been merged, and upgrade MMW to use it.

Sounds good to me. I'll make a new release and create the corresponding MMW PR.

WikiWatershed / mmw-geoprocessing