Closed rajadain closed 6 years ago
About to take a look at this.
For reference here is nlcd-soils-request-huc8.json
(RasterGroupedCount operation on a ~3500 sq km shape) without Futures:
and with:
Here is nlcd-streams-request.json
(RasterLinesJoin operation on a few 100 sq km) without Futures:
and with:
These are the files I'm testing with:
pr70-futures-sample-requests.zip
Most clearly, the initial heavy load time of fetching tiles is significantly reduced, which is great for cold requests, the kind we're likely to have most often.
After talking with some of the GT team, I need to learn a bit more about how to properly use Execution Contexts in this setting (based on some painful work on the Raster Foundry team around their tiler). I'll give a report here when I sort things out.
Ran this script with the nlcd-soils-request-huc8 simultaneously in 4 tmux panes:
#!/usr/bin/env ruby
input = ARGV[0].to_s
10.times do
system("/usr/bin/time -p http --timeout=90 :8090/run < #{input} 2>&1 > /dev/null | grep real")
end
Here are the results from develop
:
real 10.24
real 9.81
real 9.35
real 8.86
real 9.33
real 8.33
real 7.16
real 7.15
real 6.65
real 7.63
real 10.27
real 8.37
real 10.22
real 9.24
real 8.82
real 7.56
real 7.33
real 6.81
real 7.68
real 7.11
real 12.52
real 8.24
real 9.81
real 9.41
real 8.58
real 8.31
real 6.99
real 6.75
real 7.96
real 7.10
real 9.49
real 8.63
real 9.45
real 9.09
real 8.07
real 8.77
real 7.49
real 6.80
real 6.99
real 7.46
And here's this branch:
real 8.19
real 6.71
real 5.90
real 9.33
real 8.44
real 6.71
real 9.91
real 9.21
real 5.47
real 8.65
real 16.66
real 6.21
real 10.21
real 9.52
real 8.60
real 5.70
real 8.39
real 6.16
real 6.93
real 5.37
real 4.15
real 4.65
real 11.63
real 5.53
real 9.75
real 10.07
real 11.92
real 5.86
real 6.55
real 9.93
real 6.59
real 7.30
real 12.84
real 7.33
real 12.81
real 12.39
real 10.55
real 7.07
real 4.39
real 3.76
Seems like this branch is generally faster. The VisualVM output was also pretty interesting -- let me see if I can make a gif of it.
Not fully sure how to parse these but this is develop
, without the futures, where the work seems to happen in the dispatchers (green represents running, brown is "parked", yellow is "waiting"):
This is this branch, where things seem to get forked off:
&
+1 from me.
One thing this doesn't include is the "blocking-dispatcher" suggestion made here http://doc.akka.io/docs/akka-http/10.0.9/scala/http/handling-blocking-operations-in-akka-http-routes.html
I think we could probably delay thinking about those changes until we see how this performs without them?
I tried the blocking dispatcher stuff like this:
diff --git a/api/src/main/resources/application.conf b/api/src/main/resources/application.conf
index e3de9fd..b3f93d2 100644
--- a/api/src/main/resources/application.conf
+++ b/api/src/main/resources/application.conf
@@ -1,3 +1,12 @@
+mmw-dispatcher {
+ type = Dispatcher
+ executor = "thread-pool-executor"
+ thread-pool-executor {
+ fixed-pool-size = 32
+ }
+ throughput = 500
+}
+
geoprocessing {
port = 8090
hostname = "0.0.0.0"
diff --git a/api/src/main/scala/WebServer.scala b/api/src/main/scala/WebServer.scala
index 7a6814c..503e990 100644
--- a/api/src/main/scala/WebServer.scala
+++ b/api/src/main/scala/WebServer.scala
@@ -1,10 +1,10 @@
package org.wikiwatershed.mmw.geoprocessing
+import akka.dispatch.MessageDispatcher
import akka.http.scaladsl.unmarshalling.Unmarshaller._
-import akka.http.scaladsl.server.{ HttpApp, Route }
+import akka.http.scaladsl.server.{HttpApp, Route}
import akka.http.scaladsl.marshallers.sprayjson.SprayJsonSupport._
import spray.json._
-
import com.typesafe.config.ConfigFactory
import com.typesafe.scalalogging.LazyLogging
@@ -42,16 +42,21 @@ object WebServer extends HttpApp with App with LazyLogging with Geoprocessing {
} ~
post {
path("run") {
- entity(as[PostRequest]) { data =>
- data.input.operationType match {
- case "RasterGroupedCount" =>
- complete(getRasterGroupedCount(data.input))
- case "RasterGroupedAverage" =>
- complete(getRasterGroupedAverage(data.input))
- case "RasterLinesJoin" =>
- complete(getRasterLinesJoin(data.input))
- case _ =>
- throw new Exception(s"Unknown operationType: ${data.input.operationType}")
+ extractActorSystem { system =>
+ implicit val blockingDispatcher:MessageDispatcher =
+ system.dispatchers.lookup("mmw-dispatcher")
+
+ entity(as[PostRequest]) { data =>
+ data.input.operationType match {
+ case "RasterGroupedCount" =>
+ complete(getRasterGroupedCount(data.input))
+ case "RasterGroupedAverage" =>
+ complete(getRasterGroupedAverage(data.input))
+ case "RasterLinesJoin" =>
+ complete(getRasterLinesJoin(data.input))
+ case _ =>
+ throw new Exception(s"Unknown operationType: ${data.input.operationType}")
+ }
}
}
}
and tweaking the configuration with different numbers. Here are the results:
mmw-dispatcher {
type = Dispatcher
executor = "thread-pool-executor"
thread-pool-executor {
fixed-pool-size = 32
}
throughput = 500
}
mmw-dispatcher {
type = Dispatcher
executor = "thread-pool-executor"
thread-pool-executor {
fixed-pool-size = 32
}
throughput = 50
}
mmw-dispatcher {
type = Dispatcher
executor = "thread-pool-executor"
thread-pool-executor {
fixed-pool-size = 32
}
throughput = 100
}
mmw-dispatcher {
type = Dispatcher
executor = "thread-pool-executor"
thread-pool-executor {
fixed-pool-size = 16
}
throughput = 100
}
All the numbers seem worse than the default performance. Tweaking them may be useful in certain contexts, but as we don't have a good enough understanding of the different types of dispatchers available and how their parameters affect runtime, I'd lean towards forgoing this for now.
I will however wait for @lossyrob to comment on the wisdom of using global ExecutionContext.
I'm getting caught up in other things, and I don't want to be a blocker. Since this is +1 given that there may be optimizations, perhaps we merge this and write an issue to investigate potential optimizations around EC usage?
No problem! I created #71 to track that effort. I don't think there's any real rush on it.
@kellyi we should probably create a 3.0.0-beta-2
now that this has been merged, and upgrade MMW to use it.
we should probably create a 3.0.0-beta-2 now that this has been merged, and upgrade MMW to use it.
Sounds good to me. I'll make a new release and create the corresponding MMW PR.
Overview
By using Futures instead of immediate values we can fetch tiles for multiple layers in parallel, thus speeding up the entire operation as the IO is the most expensive part.
Tagging @lossyrob for code review.
Connects #67
Demo
Here are some recorded run times when running a RasterGroupedCount operation for HUC-08 in isolation:
So using the Futures is clearly faster. The total speed is slower when servicing multiple requests though:
Notes
I tested with
nlcd-soils-request-huc8.json
and withnlcd-streams-request.json
and got correct results. Running many of them in parallel took longer, but everything came through in the end:Note that these times are from my local, and not within the VM, which is likely to be a little slower.
I could not reproduce the original issue in #67, wherein a single request "blocked" the machine from accepting others. If there are any recommendations for reproducing it, I can try them, otherwise I think we should proceed with what we have and make a new card if that behavior returns.
Testing Instructions
server