Closed maxachis closed 3 years ago
Conor Thompkins and Catalina Moreno are currently on this issue, but others can feel free to join in.
Did we decide not to do this?
I think we did in fact decide not to do this, but I'm gonna hold off on closing it until we get confirmation from @conorotompkins @cgmoreno that this is in fact the case.
@maxachis i recall that we decided not to pursue the Uber H3 option
Having confirmed with Cat as well, I think we're all good to close this.
Uber manages its locational indexing using a hexagonal binning system, which is a fancy way of saying it groups nearby coordinates into hexagons. We could possibly use this same method for deduping -- putting location coordinates into Uber's coordinate-to-hexagon system and, if there are more than one location in the same hexagon, flagging it as a potential duplicate.
The H3 system can be found at https://h3geo.org/. Though originally developed for C, there are bindings for a number of other languages, including R, Python, Javascript, and C#, among others, though these aren't necessarily officially supported.
A sub-issue of #16 - Dedupe food stores in merged dataset