WikiWatershed / mmw-geoprocessing

A Spark Job Server job for Model My Watershed geoprocessing.
Apache License 2.0
6 stars 6 forks source link

Implement RasterLinesJoin operation #68

Closed kellyi closed 7 years ago

kellyi commented 7 years ago

Overview

This PR implements the RasterLinesJoin geoprocessing operation used in MapShed.

~All the infrastructure's wired up and operation's returns a count, but the count's currently slightly off.~ This is fixed now: my initial implementation was double counting values because the same row/col cells would be visited multiple times; my second try undercounted cells because it tried to throw out cells which had already been visited, but didn't track the visited cells correctly.

Now, we initialize the TrieMap to have a key which is a tuple comprising

var pixelGroups: TrieMap[(List[Int], List[Int], SpatialKey), Int] = TrieMap.empty

... and then we only count the cell if it hasn't already been visited in that tile

At the end we drop the cell col/row & SpatialKey...

.map { case ((key, _, _), value) => (key, value) }

...then group by the raster values and sum.

Connects #53

Testing

rajadain commented 7 years ago

Taking a look now.

rajadain commented 7 years ago

Tested in MMW against identical runs on Staging: the values are identical. Great work!

kellyi commented 7 years ago

Wonderful! I'm going to make the adjustments suggested above, then test everything out again before merging.

kellyi commented 7 years ago

Made the changes suggested above & tested again on my local. Everything's still working as before. Going to merge this once the tests pass.

Thanks for your help with this!