datameet-pune / datameet-pune.github.io

Common repo and documentation space for DataMeet Pune chapter
https://sites.google.com/view/datameetpune/home
GNU General Public License v3.0
16 stars 20 forks source link

Geohash idea for Bus stops (and other location redundancy) de-duplication #14

Open answerquest opened 6 years ago

answerquest commented 6 years ago

From Pune Open Data portal, we have lat-long data of bus stops, but it is non-unique and heavily repeating in some cases. The BRT stops were there in a separate unique list so they are easy to pry out, but the larger dataset of non-BRT stops needs work.

Geohashes resolve lat-long values into square areas. So, a pair of lat-longs that are very close to each other but not the same can be resolved to belong to the same geohash. So, this could be a way of clustering the stops data. Links:

answerquest commented 6 years ago

Some more links: