Code4HR / open-health-inspection-scraper

Scraper for the open-health-inspector app.
Apache License 2.0
7 stars 9 forks source link

Numbers in restaurant names #17

Closed wbprice closed 10 years ago

wbprice commented 10 years ago

screen shot 2014-07-10 at 10 11 07 pm A lot of the restaurants near me have numbers after the name. See above for an example. What purpose do these serve, and if none could we modify the scraper/remove from database?

qwo commented 10 years ago

cc @ttavenner i thought it was a way for unique POI but i agree they're largely unhuman friendly

ALSO since its uniform we could just parse and string replace anything in parenthesis for right now to remove it

ttavenner commented 10 years ago

I will need to investigate this. The biggest difficulty is making sure it can be identified the in the scraper or we will need a separate hygiene process to run after each update. Interestingly this could actually be an internal ID from VA DOH. It looks like the first two digits might identify type and the second identify the vendor. I wish we had that data on all of them

ttavenner commented 10 years ago

I added some basic correction for this into the scraper and created a script to fix all existing instances. While I was at it I removed newlines from a number of vendor names and updated the scraper to handle these.