Closed rajadain closed 7 years ago
Looks like I'm getting a few slightly different values for the huc8 json:
diff --git a/nlcd-soils-response-huc8.json b/nlcd-soils-response-huc8-new.json
index 70f8bc4..3ab2fde 100644
--- a/nlcd-soils-response-huc8.json
+++ b/nlcd-soils-response-huc8-new.json
@@ -101,7 +101,7 @@
"List(11, 2, 28)": 166,
"List(95, 7, 40)": 181,
"List(23, 6, 41)": 663,
- "List(23, 7, -2147483648)": 362,
+ "List(23, 7, -2147483648)": 345,
"List(22, 4, 40)": 3155,
"List(52, 2, 41)": 33832,
"List(11, 6, -2147483648)": 59,
@@ -125,7 +125,7 @@
"List(95, 2, 34)": 1,
"List(43, 3, 2)": 33,
"List(52, 6, 38)": 148,
- "List(22, -2147483648, -2147483648)": 29987,
+ "List(22, -2147483648, -2147483648)": 29627,
"List(21, -2147483648, 28)": 2752,
"List(22, -2147483648, 34)": 543,
"List(52, 2, 38)": 3976,
@@ -145,7 +145,6 @@
"List(21, 2, 45)": 320,
"List(52, 4, 2)": 5,
"List(21, 6, -2147483648)": 639,
- "List(81, -2147483648, 16)": 12,
"List(82, 2, -2147483648)": 6291,
"List(43, 4, 38)": 383,
"List(52, -2147483648, 41)": 4454,
@@ -195,7 +194,7 @@
"List(95, 7, 16)": 4,
"List(82, 1, 41)": 38098,
"List(23, 6, 28)": 45,
- "List(81, -2147483648, -2147483648)": 1012,
+ "List(81, -2147483648, -2147483648)": 962,
"List(52, 7, 30)": 2,
"List(71, 6, 41)": 150,
"List(24, 6, 41)": 158,
@@ -221,8 +220,8 @@
"List(90, 4, 40)": 2509,
"List(41, 7, 44)": 20,
"List(95, -2147483648, 40)": 137,
- "List(24, 7, -2147483648)": 61,
- "List(24, -2147483648, -2147483648)": 42048,
+ "List(24, 7, -2147483648)": 56,
+ "List(24, -2147483648, -2147483648)": 41672,
"List(82, 6, 28)": 33,
"List(81, 2, 40)": 185423,
"List(11, 3, 2)": 10,
@@ -242,8 +241,8 @@
"List(82, 3, 44)": 1,
"List(11, 6, 28)": 94,
"List(95, 6, 40)": 503,
- "List(21, -2147483648, 16)": 167,
- "List(23, -2147483648, 16)": 385,
+ "List(21, -2147483648, 16)": 141,
+ "List(23, -2147483648, 16)": 298,
"List(11, 2, 38)": 416,
"List(41, -2147483648, 2)": 98,
"List(82, 2, 34)": 3933,
@@ -255,9 +254,9 @@
"List(11, -2147483648, 16)": 72,
"List(42, 2, 28)": 1217,
"List(23, 1, 2)": 5,
- "List(41, -2147483648, -2147483648)": 7270,
+ "List(41, -2147483648, -2147483648)": 7249,
"List(0, -2147483648, -2147483648)": 21,
- "List(90, -2147483648, 41)": 3873,
+ "List(90, -2147483648, 41)": 3872,
"List(42, 1, 41)": 3859,
"List(11, 6, 2)": 5,
"List(23, 2, 38)": 5058,
@@ -285,12 +284,12 @@
"List(43, 2, 41)": 8036,
"List(42, 3, 34)": 3,
"List(42, 3, 38)": 158,
- "List(21, 7, 41)": 30994,
+ "List(21, 7, 41)": 30985,
"List(21, 2, 44)": 3,
"List(23, -2147483648, 30)": 236,
"List(41, 2, 38)": 25936,
"List(11, 4, 28)": 208,
- "List(22, 7, 41)": 7220,
+ "List(22, 7, 41)": 7187,
"List(95, 7, 38)": 29,
"List(24, 3, 38)": 325,
"List(82, -2147483648, 38)": 1083,
@@ -385,7 +384,7 @@
"List(0, 6, 41)": 3,
"List(71, 3, 38)": 230,
"List(90, 6, 34)": 2,
- "List(21, 7, -2147483648)": 1543,
+ "List(21, 7, -2147483648)": 1530,
"List(82, 3, 28)": 373,
"List(95, 2, 40)": 234,
"List(71, 7, 28)": 51,
@@ -421,8 +420,8 @@
"List(21, 2, 41)": 123837,
"List(21, 2, 34)": 791,
"List(11, 6, 38)": 20,
- "List(82, -2147483648, -2147483648)": 719,
- "List(22, -2147483648, 41)": 70960,
+ "List(82, -2147483648, -2147483648)": 717,
+ "List(22, -2147483648, 41)": 70947,
"List(82, 2, 30)": 4,
"List(90, 2, 34)": 31,
"List(90, 2, 38)": 1076,
@@ -445,7 +444,7 @@
"List(81, 1, 2)": 87,
"List(42, 3, 40)": 807,
"List(42, 6, 40)": 693,
- "List(31, 7, -2147483648)": 16,
+ "List(31, 7, -2147483648)": 8,
"List(90, 1, 28)": 19,
"List(11, 1, 28)": 1362,
"List(22, 1, 44)": 19,
@@ -475,7 +474,6 @@
"List(22, 2, 38)": 10925,
"List(21, 2, 2)": 35,
"List(22, 4, 28)": 1111,
- "List(71, -2147483648, 16)": 56,
"List(52, 6, 40)": 576,
"List(52, 1, 2)": 16,
"List(90, 1, 44)": 6,
@@ -483,7 +481,7 @@
"List(22, 1, 28)": 2997,
"List(21, 4, 30)": 5,
"List(81, 2, 30)": 5,
- "List(23, 7, 41)": 1685,
+ "List(23, 7, 41)": 1681,
"List(11, 1, -2147483648)": 644,
"List(23, 6, 30)": 2,
"List(90, -2147483648, 30)": 5,
@@ -492,7 +490,7 @@
"List(31, 4, 41)": 735,
"List(21, 7, 28)": 917,
"List(52, -2147483648, 45)": 5,
- "List(71, -2147483648, -2147483648)": 239,
+ "List(71, -2147483648, -2147483648)": 183,
"List(24, 2, 30)": 1,
"List(21, 1, 28)": 11665,
"List(71, 4, 40)": 378,
@@ -518,8 +516,7 @@
"List(82, -2147483648, 41)": 3902,
"List(23, -2147483648, 2)": 1,
"List(41, 1, 44)": 477,
- "List(52, -2147483648, 16)": 12,
- "List(52, -2147483648, -2147483648)": 509,
+ "List(52, -2147483648, -2147483648)": 494,
"List(43, 2, -2147483648)": 559,
"List(82, 4, -2147483648)": 1147,
"List(42, -2147483648, 34)": 9,
@@ -599,7 +596,7 @@
"List(42, 7, 2)": 36,
"List(81, 3, 44)": 10,
"List(90, -2147483648, 45)": 13,
- "List(22, -2147483648, 16)": 255,
+ "List(22, -2147483648, 16)": 206,
"List(21, 1, 38)": 4884,
"List(11, 4, -2147483648)": 109,
"List(42, 7, 38)": 64,
@@ -647,7 +644,7 @@
"List(22, 1, 45)": 2,
"List(21, 2, 40)": 73631,
"List(82, 3, -2147483648)": 1234,
- "List(21, -2147483648, -2147483648)": 24285,
+ "List(21, -2147483648, -2147483648)": 24079,
"List(0, -2147483648, 38)": 114,
"List(23, 4, 28)": 451,
"List(24, 6, 28)": 4,
@@ -661,7 +658,7 @@
"List(42, 2, 34)": 13,
"List(21, 3, 28)": 1371,
"List(81, 1, 40)": 62844,
- "List(23, -2147483648, 41)": 39571,
+ "List(23, -2147483648, 41)": 39561,
"List(24, -2147483648, 38)": 24365,
"List(0, -2147483648, 45)": 1,
"List(31, 1, 28)": 8346,
@@ -715,7 +712,7 @@
"List(52, 7, 28)": 74,
"List(90, 7, 44)": 10,
"List(95, 2, -2147483648)": 29,
- "List(22, 7, -2147483648)": 538,
+ "List(22, 7, -2147483648)": 494,
"List(43, 7, 28)": 1074,
"List(43, -2147483648, 41)": 1121,
"List(81, 1, 44)": 97,
@@ -744,7 +741,7 @@
"List(81, 2, 2)": 122,
"List(21, -2147483648, 45)": 304,
"List(22, 2, 40)": 21162,
- "List(21, -2147483648, 41)": 72830,
+ "List(21, -2147483648, 41)": 72825,
"List(90, -2147483648, 16)": 33,
"List(21, 3, 45)": 40,
"List(21, 7, 16)": 48,
@@ -761,7 +758,7 @@
"List(82, 4, 28)": 274,
"List(41, -2147483648, 24)": 2,
"List(81, 6, 41)": 12230,
- "List(90, -2147483648, -2147483648)": 1160,
+ "List(90, -2147483648, -2147483648)": 1070,
"List(41, 7, 40)": 35307,
"List(42, 3, -2147483648)": 233,
"List(90, 3, 44)": 12,
@@ -804,7 +801,7 @@
"List(41, 4, 45)": 6,
"List(11, 1, 38)": 18,
"List(21, 6, 45)": 13,
- "List(23, -2147483648, -2147483648)": 45630,
+ "List(23, -2147483648, -2147483648)": 45106,
"List(81, 7, -2147483648)": 558,
"List(71, -2147483648, 2)": 8,
"List(41, 2, 28)": 18416,
@@ -812,7 +809,7 @@
"List(71, 1, 2)": 70,
"List(23, 2, 34)": 261,
"List(11, 6, 41)": 945,
- "List(24, -2147483648, 16)": 183,
+ "List(24, -2147483648, 16)": 141,
"List(42, -2147483648, 28)": 395,
"List(22, 2, 45)": 96,
"List(31, -2147483648, 34)": 49,
The responses from the other JSON were the same, btw.
Hmm, okay, taking a look.
Try interacting with the main app. Notice if long, thin, diagonal shapes take less time than staging
Also having some trouble getting this to work in the main app. I put the new geoprocessing service in place, but the app now returns 500s for the analysis endpoints:
I split the request into three individual RasterGroupedCount calls, each requesting one layer, and found the values of NLCD and Soil Texture to be identical, but of Soil Group different between the two:
nlcd-request-huc8.json.txt nlcd-response-huc8.json.txt nlcd-response-huc8-new.json.txt
soil-request-huc8.json.txt soil-response-huc8.json.txt soil-response-huc8-new.json.txt
soil-texture-request-huc8.json.txt soil-texture-response-huc8.json.txt soil-texture-response-huc8-new.json.txt
$ diff -u nlcd-response-huc8.json nlcd-response-huc8-new.json
$ diff -u soil-response-huc8.json soil-response-huc8-new.json
--- soil-response-huc8.json 2017-05-23 13:44:53.000000000 -0400
+++ soil-response-huc8-new.json 2017-05-23 13:47:09.000000000 -0400
@@ -1,11 +1,11 @@
{
"result": {
"List(2)": 1900300,
- "List(-2147483648)": 834465,
+ "List(-2147483648)": 832452,
"List(4)": 654295,
"List(3)": 490209,
"List(6)": 120754,
- "List(7)": 348163,
+ "List(7)": 348030,
"List(1)": 1154272
}
}
\ No newline at end of file
$ diff -u soil-texture-response-huc8.json soil-texture-response-huc8-new.json
Investigating why this is.
Also having some trouble getting this to work in the main app. I put the new geoprocessing service in place, but the app now returns 500s for the analysis endpoints:
This turned out to be an error on my local: neglected to rebundle after reviewing a PR opened prior to merging some updates to how the client & API handle AOIs.
Okay after doing some more tests, I'm finding that there are consistent differences across all layers (I may have made a mistake with the above results). Comparing with values on staging, I find that there's around ~0.1% difference in the values, with values on this branch being less than the previous values. See calculations here: https://codepen.io/anon/pen/zweBPG?editors=0012:
Average Count Diff Percent: 0.0872089680750769
Average Area Diff Percent: 0.09302289928008084
Average Coverage Diff Percent: 0.09542434609139239
@echeipesh Do you have any idea why this is happening?
@mmcfarland & @ajrobbins Would a 0.1% difference in results be worth the 10-20% improvement in performance?
Area and Coverage are already a little murky to begin with given that we're doing fuzzy logic on latitude and pixel size to come up with that, I think it's well within the tolerance range. The total number of cells that are counted are different, which seems a little more concerning (and likely explain why the rest of the results also differ):
5,502,458
5,500,312
We're now accounting for 2,000 fewer cells @ 30m2 each. My guess is it's a boundary thing. Would be interested in speculation on why the results differ, so we at least have some justification for the change. For all we know, it may be more accurate, but we've probably got to account for that 60k m2 before we commit to it.
You might be able to do a simple test by supplying a very small AoI with just a handful of cells in it and seeing the number of total cells counted.
For reference, I summed with:
let sumOld = 0;
let sumNew = 0;
for (let x in oldValues.result) {
sumOld += oldValues.result[x];
}
for (let x in newValues.result) {
sumNew += newValues.result[x];
}
Using smaller AoIs, like the 71 km² HUC-12 Middle Conestoga River near Leacock, PA, the results are identical. Going to try with a HUC-10 now.
Just tried the 698 km² HUC-10 Conestoga River near Leacock PA, and staging counts 774661, while develop only counts 748228. The difference of 26433 cells is almost ~3.5%, so more significant than we've seen so far.
Perhaps we should hold off on this until we can explain this properly.
The main difference here is between clipping using GeometryCollection(polygons).envelope
and polygons.unionGeometries.asMultiPolygon.get
.
Wonder if there's some quirky difference between the intersection
methods for Extent and MultiPolygon types?
@kellyi seems to have hit the nail on the head. Intersection
for extent vs multiPolygon
Investigated this a bit more with @echeipesh today, and will continue tomorrow. Using the Conestoga River HUC-10 which had a difference of 3.5%, and running both types of intersections, we get the following tiles:
huc10-conestoga-envelope-intersect.geojson.txt
huc10-conestoga-multipolygon-intersect.geojson.txt
And for reference, here is the Scala Worksheet we're working with: fetch-less.sc.txt
Once we figure out the reprojection errors in the tiles (which hopefully isn't a symptom of something worse) we may be able to pinpoint the missing tile, sourcing the missing values.
Here are the corresponding images for the Schuylkill HUC-08, which had a difference of 0.1%:
This is visualization of the query in Albers projection:
Bug found and fixed: https://github.com/locationtech/geotrellis/pull/2213
As the above fix will be in GeoTrellis 1.1, and likely will not be back-ported to 0.10, we cannot use the multipolygon intersection optimization in the current implementation. Thus I am closing this pull request.
As we are considering upgrading to GeoTrellis 1.1 for the Collections API as part of https://github.com/WikiWatershed/model-my-watershed/pull/1919, it is possible we may still get to use this functionality in the future.
Overview
Previously we would fetch all tiles from S3 that intersected with the bounding box of the incoming shape. This was wasteful, especially in cases where the shape was a long diagonal.
Now, by cropping to the multiPolygon resulting from unioning all incoming shapes, rather than calculating an envelope, we ensure that only relevant tiles are fetched.
Since fetching from S3 is a sizable portion of our latency, this change improves performance 10-20%.
Connects #41
Demo
Here's a table of runtimes in seconds with nlcd-soils-request-huc8.json which is a Soils + NLCD RasterGroupedCount operation (used for TR-55) with the Schuylkill HUC-08, and nlcd-soils-request.json which is the same request but with the Little Neshaminy HUC-12 instead:
develop
tt/waste-less-fetch-less
develop
tt/waste-less-fetch-less
Testing Instructions
Tagging @echeipesh for quick code review.
Download nlcd-soils-request-huc8.json and nlcd-soils-request.json, and use them as is, or replace the
polygon
with other shapesUse the JSONs to develop a baseline:
Check out this branch, and build the geoprocessing service and load it into the worker (where
$MMW_GEOPROCESSING_DIR
is your local directory where you have checked out this repo, and$MMW_DIR
is your local directory where you have checked out the main app):Run the above requests again, and save their responses to new files:
Ensure that the original response and the
-new
responses have the same valuesTry interacting with the main app. Notice if long, thin, diagonal shapes take less time than staging