CodeForPittsburgh / food-access-map-data

Data for the food access map
MIT License
8 stars 18 forks source link

Try deduplication using Uber's H3 System #46

Closed maxachis closed 3 years ago

maxachis commented 3 years ago

Uber manages its locational indexing using a hexagonal binning system, which is a fancy way of saying it groups nearby coordinates into hexagons. We could possibly use this same method for deduping -- putting location coordinates into Uber's coordinate-to-hexagon system and, if there are more than one location in the same hexagon, flagging it as a potential duplicate.

The H3 system can be found at https://h3geo.org/. Though originally developed for C, there are bindings for a number of other languages, including R, Python, Javascript, and C#, among others, though these aren't necessarily officially supported.

A sub-issue of #16 - Dedupe food stores in merged dataset

maxachis commented 3 years ago

Conor Thompkins and Catalina Moreno are currently on this issue, but others can feel free to join in.

hellonewman commented 3 years ago

Did we decide not to do this?

maxachis commented 3 years ago

I think we did in fact decide not to do this, but I'm gonna hold off on closing it until we get confirmation from @conorotompkins @cgmoreno that this is in fact the case.

conorotompkins commented 3 years ago

@maxachis i recall that we decided not to pursue the Uber H3 option

maxachis commented 3 years ago

Having confirmed with Cat as well, I think we're all good to close this.