AnanseGroup / atlas_of_innovation

Interactive map, database, API for all the innovation spaces everywhere
https://www.atlasofinnovation.com
4 stars 5 forks source link

Dealing with duplicated or repeated spaces #95

Open pedro-q opened 5 years ago

pedro-q commented 5 years ago

As we can't uniquely identify each space with some field or value, we have to develop a method or process to identify when data may be repeated in the database. I'm currently working in some schemes that may be useful for this task namely fuzzy hashing and searching in near places by latitude and longitude. I'll go into more detail in a README I'm writing but I think I'll be good to get the conversation going about this issue in this thread, so more ideas about this are welcome.

annawb commented 5 years ago

This sounds great! The existing column “Alias” contains alternate names of a bunch of spaces from early de-duplication efforts, particularly from fablabs.io http://fablabs.io/ and hackerspaces.org http://hackerspaces.org/.

On Dec 3, 2018, at 11:41 AM, pedroquiroz notifications@github.com wrote:

As we can't uniquely identify each space with some field or value, we have to develop a method or process to identify when data may be repeated in the database. I'm currently working in some schemes that may be useful for this task namely fuzzy hashing and searching in near places by latitude and longitude. I'll go into more detail in a README I'm writing but I think I'll be good to get the conversation going about this issue in this thread, so more ideas about this are welcome.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/AnanseGroup/atlas_of_innovation/issues/95, or mute the thread https://github.com/notifications/unsubscribe-auth/ASnZez0i0QPxiVsTxisMegw8x1b6vtQxks5u1VQkgaJpZM4Y-90M.