Open pedro-q opened 5 years ago
This sounds great! The existing column “Alias” contains alternate names of a bunch of spaces from early de-duplication efforts, particularly from fablabs.io http://fablabs.io/ and hackerspaces.org http://hackerspaces.org/.
On Dec 3, 2018, at 11:41 AM, pedroquiroz notifications@github.com wrote:
As we can't uniquely identify each space with some field or value, we have to develop a method or process to identify when data may be repeated in the database. I'm currently working in some schemes that may be useful for this task namely fuzzy hashing and searching in near places by latitude and longitude. I'll go into more detail in a README I'm writing but I think I'll be good to get the conversation going about this issue in this thread, so more ideas about this are welcome.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/AnanseGroup/atlas_of_innovation/issues/95, or mute the thread https://github.com/notifications/unsubscribe-auth/ASnZez0i0QPxiVsTxisMegw8x1b6vtQxks5u1VQkgaJpZM4Y-90M.
As we can't uniquely identify each space with some field or value, we have to develop a method or process to identify when data may be repeated in the database. I'm currently working in some schemes that may be useful for this task namely fuzzy hashing and searching in near places by latitude and longitude. I'll go into more detail in a README I'm writing but I think I'll be good to get the conversation going about this issue in this thread, so more ideas about this are welcome.