CodeForPittsburgh / food-access-map-data

Data for the food access map
MIT License
8 stars 18 forks source link

Script to flag potential duplicates (Super De Duper!) #57

Closed hellonewman closed 3 years ago

hellonewman commented 3 years ago

I am hoping either @cgmoreno could provide more detail or Matt knows what this is referring to!

hellonewman commented 3 years ago

Ah, I think this was it: (from Catalina's ppt) Duplication id step Matt Euclidean dist check 🡪 group1 String dist check + numeric address check + green grocer rules + food bank rule🡪 group2

hellonewman commented 3 years ago

3/16 update: Matt's script produces 3 columns...each increases in granularity (or does it). Testing step only uses 1 of these columns. Need to figure out which column to weight most.

Next step: select one option based on the unit testing. Then Matt will implement 1 of them.

hellonewman commented 3 years ago

Matt + MAX to discuss today

maxachis commented 3 years ago

Script to flag potential duplicates been added. Now only issue is unit testing.