Closed rajadain closed 6 years ago
Q: instead of using the scripts above will the existing ./scripts/benchmark
work just as well for testing purposes?
Nevermind, looks like we don't have a benchmarking test on RasterLinesJoin? Wonder if it's worth adding one.
I added the latest RasterLinesJoin
to the benchmark script with the old behavior, then added my fix, then updated the benchmark script with the new behavior, if you'd like to use that instead.
The order of commits in GitHub is incorrect (because it uses Author date, not Commit date). The true order is this:
* 118eeea - (HEAD -> tt/crop-lines-to-aoi, origin/tt/crop-lines-to-aoi) Update RasterLinesJoin benchmark (66 seconds ago) <Terence Tuhinanshu>
* e9e0aa6 - Crop incoming lines to area of interest (83 seconds ago) <Terence Tuhinanshu>
* 1818cb8 - Add RasterLinesJoin benchmark (83 seconds ago) <Terence Tuhinanshu>
* de66670 - (origin/develop, origin/HEAD, develop) Merge pull request #75 from WikiWatershed/ki/improve-error-reporting (8 weeks ago) <Kelly Innes>
Great! Thanks for adding that.
RLJ timings on develop
:
Timing HUC8 RasterLinesJoin ->
HUC8 RasterLinesJoin, run 1 -> 6.944408 s
HUC8 RasterLinesJoin, run 2 -> 1.918769 s
HUC8 RasterLinesJoin, run 3 -> 2.046422 s
HUC8 RasterLinesJoin, run 4 -> 1.342596 s
HUC8 RasterLinesJoin, run 5 -> 1.817221 s
HUC8 RasterLinesJoin average -> 2.8138832 s
This is probably a more accurate RLJ run from develop
since it leaves off the initial 7 second warmup run:
Timing HUC8 RasterLinesJoin ->
HUC8 RasterLinesJoin, run 1 -> 2.000698 s
HUC8 RasterLinesJoin, run 2 -> 1.803359 s
HUC8 RasterLinesJoin, run 3 -> 1.714622 s
HUC8 RasterLinesJoin, run 4 -> 1.753615 s
HUC8 RasterLinesJoin, run 5 -> 1.450551 s
HUC8 RasterLinesJoin average -> 1.744569 s
Here's the benchmark timing from this branch after a few runs:
Timing HUC8 RasterLinesJoin ->
HUC8 RasterLinesJoin, run 1 -> 1.379907 s
HUC8 RasterLinesJoin, run 2 -> 1.380688 s
HUC8 RasterLinesJoin, run 3 -> 1.442743 s
HUC8 RasterLinesJoin, run 4 -> 1.693359 s
HUC8 RasterLinesJoin, run 5 -> 1.430437 s
HUC8 RasterLinesJoin average -> 1.4654268 s
Ensure the two output files are different.
Output differs:
$ diff develop-output.json develop-clipped-streams.json
3,5c3,5
< "List(42)": 2149,
< "List(22)": 3252,
< "List(43)": 2387,
---
> "List(42)": 2215,
> "List(22)": 3253,
> "List(43)": 2393,
7,8c7,8
< "List(41)": 33341,
< "List(21)": 11553,
---
> "List(41)": 33404,
> "List(21)": 11558,
12,14c12,14
< "List(90)": 14005,
< "List(52)": 2874,
< "List(11)": 8095,
---
> "List(90)": 14014,
> "List(52)": 2876,
> "List(11)": 8128,
16,17c16,17
< "List(82)": 4736,
< "List(81)": 10301,
---
> "List(82)": 4745,
> "List(81)": 10314,
New output from this branch matches the develop-output.json
above:
This seems like it's good to go in: it works as described by the testing instructions, and the benchmarking scripts don't show much much difference in the timings. Makes me wonder whether there are other operations we're doing Python-side that could be sped up by moving them here.
I apologize that this comment comes from a place of limited context, but how does this relate to the existing work done in https://github.com/WikiWatershed/model-my-watershed/pull/2506? My concern about the comparison between PostGIS vs this branch is that it may have been done in the context of VM resources in the development environment. This may indeed be an improvement, but I'd like to see a more detailed description of the decision points and observations that led to this changeset, and how it relates to the previous work proposed.
Spoke with @mmcfarland in person about the motivations and effects, and we have a green light. I'm going to merge this and make a new release, then include it in WikiWatershed/model-my-watershed#2502.
Overview
Previously we were cropping this in PostGIS which was taking forever (see WikiWatershed/model-my-watershed#2426 and WikiWatershed/model-my-watershed#2502). Since doing it in Scala is negligibly slower than not, we shift the cropping to the geoprocessing service from PostGIS.
Connects WikiWatershed/model-my-watershed#2502 :zap:
Testing Instructions
Don't check out this branch yet. Download this request file: develop.json.txt. It is the
RasterLinesJoin
request for the Schuylkill HUC-08 with clipped streams, i.e. the current behavior ondevelop
.Run the server with
./scripts/server
.POST the above request and save the output:
http :8090/run < develop.json.txt > develop-output.json
Run a simple benchmark test to get a performance baseline:
for ((n = 0; n < 5; n++)); do { time -p curl --silent --data-binary @develop.json.txt 'http://localhost:8090/run' ;} 2>&1 | grep real | awk '{print $2}'; done
Download this request file: mapshed-optimize-streams.json.txt. This is the
RasterLinesJoin
request for the Schuylkill HUC-08 with unclipped streams, i.e. the behavior proposed in WikiWatershed/model-my-watershed#2506.POST the above request and save the output:
http :8090/run < mapshed-optimize-streams.json.txt > develop-unclipped-output.json
Ensure the two output files are different.
Now check out this branch.
POST the above request again, and save the output to a new file:
http :8090/run < mapshed-optimize-streams.json.txt > mapshed-optimize-streams-output.json
Ensure that this file is the same as the original
develop-output.json
file.Run the simple benchmark again to ensure that the change from the baseline is minimal:
for ((n = 0; n < 5; n++)); do { time -p curl --silent --data-binary @mapshed-optimize-streams.json.txt 'http://localhost:8090/run' ;} 2>&1 | grep real | awk '{print $2}'; done