WikiWatershed / mmw-geoprocessing

A Spark Job Server job for Model My Watershed geoprocessing.
Apache License 2.0
6 stars 6 forks source link

Subbasin MultiOperation Benchmark #87

Closed rajadain closed 6 years ago

rajadain commented 6 years ago

Overview

Adds two benchmarking scripts, benchmark-mapshed and benchmark-subbasin, that run MapShed and Subbasin the "naive" way, using the existing /run endpoint. Also updates existing benchmark script to run MapShed and Subbasin the new way, using the /multi endpoint. Adds a Pipenv file for managing the Python dependencies for the benchmark files, and updates the setup script to install those dependencies. Adds a container benchmark script that runs all three sub-scripts given flags.

These benchmarks run against http://localhost:8090/. When running ./scripts/server in this project, the benchmarks run against a server running natively on the developer's machine. When running against MMW, the benchmarks run against a server in the Worker VM.

These are some graphs from benchmarks comparing these cases across five runs:

image

image

In both cases, the blue lines are of the current implementation and the orange lines of the new one. The solid lines are native tests, and the dashed lines are in the Worker VM. Notably, Subbasin did not finish in the Worker VM at all.

The difference between native and Worker VM runs is due to resource allocation: the Worker VM has only 2 vCPUs and only 2 GB RAM, whereas my native machine has a quadcore CPU with 16 GB of RAM. Staging and Production environments have beefier Worker VMs, and have correspondingly shorter runtimes. This process seems to be more CPU bound than RAM bound.

Connects #83

Demo

``` $ ./scripts/benchmark --sync ``` ``` Timing RasterGroupedAverage -> RasterGroupedAverage, run 1 -> 1.310392 s RasterGroupedAverage, run 2 -> 0.666542 s RasterGroupedAverage, run 3 -> 0.455111 s RasterGroupedAverage, run 4 -> 0.240777 s RasterGroupedAverage, run 5 -> 3.789703 s RasterGroupedAverage average -> 1.2925049999999998 s Timing HUC8 RasterLinesJoin -> HUC8 RasterLinesJoin, run 1 -> 1.282524 s HUC8 RasterLinesJoin, run 2 -> 1.219903 s HUC8 RasterLinesJoin, run 3 -> 1.346176 s HUC8 RasterLinesJoin, run 4 -> 1.509372 s HUC8 RasterLinesJoin, run 5 -> 1.572968 s HUC8 RasterLinesJoin average -> 1.3861885999999999 s Timing HUC8 RasterSummary -> HUC8 RasterSummary, run 1 -> 0.232893 s HUC8 RasterSummary, run 2 -> 0.465685 s HUC8 RasterSummary, run 3 -> 0.164735 s HUC8 RasterSummary, run 4 -> 2.80515 s HUC8 RasterSummary, run 5 -> 0.60106 s HUC8 RasterSummary average -> 0.8539046000000001 s ```
``` $ ./scripts/benchmark --mapshed ``` ``` Timing HUC8 MultiOperation MapShed -> HUC8 MultiOperation MapShed, run 1 -> 13.442911 s HUC8 MultiOperation MapShed, run 2 -> 6.674352 s HUC8 MultiOperation MapShed, run 3 -> 6.212541 s HUC8 MultiOperation MapShed, run 4 -> 6.86085 s HUC8 MultiOperation MapShed, run 5 -> 6.103041 s HUC8 MultiOperation MapShed average -> 7.858739 s Timing HUC8 MapShed, separate requests per operation -> HUC8 MapShed, 7 operations, run 1 -> 8.18 s HUC8 MapShed, 7 operations, run 2 -> 8.30 s HUC8 MapShed, 7 operations, run 3 -> 8.03 s HUC8 MapShed, 7 operations, run 4 -> 10.61 s HUC8 MapShed, 7 operations, run 5 -> 9.56 s ```
``` $ ./scripts/benchmark --subbasin ``` ``` Timing 61 HUC12 MultiOperation Subbasin -> 61 HUC12 MultiOperation Subbasin, run 1 -> 12.072989 s 61 HUC12 MultiOperation Subbasin, run 2 -> 12.301599 s 61 HUC12 MultiOperation Subbasin, run 3 -> 12.132871 s 61 HUC12 MultiOperation Subbasin, run 4 -> 12.51495 s 61 HUC12 MultiOperation Subbasin, run 5 -> 13.357691 s 61 HUC12 MultiOperation Subbasin average -> 12.47602 s Timing 61 HUC12 Subbasin, separate requests per operation -> 61 HUC12 Subbasin, 7 operations, run 1 -> 78.48 s 61 HUC12 Subbasin, 7 operations, run 2 -> 92.60 s 61 HUC12 Subbasin, 7 operations, run 3 -> 75.59 s 61 HUC12 Subbasin, 7 operations, run 4 -> 59.18 s 61 HUC12 Subbasin, 7 operations, run 5 -> 67.95 s ```

Notes

The new pipenv dependency should added to the README, but it is far too out of date, so I decided to only add the message in the setup script. #56 exists to update the README.

The added value of request-timeout is to allow longer requests to complete, which take more than the default 20 s (some of the MultiOperation requests for Schuylkill size HUC-8s take 45+ seconds).

Testing Instructions

arottersman commented 6 years ago

Taking a look now

arottersman commented 6 years ago

This is so convenient!

Should be ned-nhdplus-30m-epsg5070-512

rajadain commented 6 years ago

In the graphs above, are the runs all on a warm server or are the first always on a cold one?

In the MapShed case, yes. In the Subbasin case, they may all be from a warm run. The individual runs matter less than the average times. I averaged the times and made another chart for them:

image

rajadain commented 6 years ago

One of the benchmarking inputs references the old version of NED

Good catch. I updated all the layers in all the tests and samples in 0bebf81.

arottersman commented 6 years ago

We will likely want to do one final set of tests comparing current staging to the eventual multi-operation staging. One thing the tests in this PR don't capture is the existence of multiple workers in addition to multiple threads. It's possible the second worker would decrease the "regular" Mapshed job to about the same as the multi-operation version.

rajadain commented 6 years ago

That is true. The benchmarking here is very narrowly focused on the geoprocessing service. Should we add a card to the main repo to develop end-to-end benchmarks?

arottersman commented 6 years ago

Thinking on this some more...let's definitely make the card, but maybe not do it. We have a lot of other sub-basin to crunch on, and the multi-operation work has made it all possible. Whether the mapshed timings end up being more or less similar likely won't affect our moving forward with it.

rajadain commented 6 years ago

Thanks for reviewing this! I made https://github.com/WikiWatershed/model-my-watershed/issues/2736 for benchmarking the main app. We can bring it in when appropriate.