Code4HR / open-health-inspection-app

The frontend for Code for Hampton Roads' Open Health Inspection Data.
http://ohi.code4hr.org
GNU General Public License v3.0
13 stars 21 forks source link

How to handle missing data #43

Closed wbprice closed 10 years ago

wbprice commented 10 years ago

I've noticed a couple of instances where there isn't inspection data in the database for specific restaurants. As examples, I don't see anything for Pho 79 and This Old House where I live.

screenshot from 2014-05-03 11 42 59

I wondering if this is an instance of that information just not being in the database or some other problem.

ttavenner commented 10 years ago

Are you finding a listing for the restaurant with no inspection data specifically or just no listing for the restaurant at all? I'm having trouble finding these restaurants at all in the MongoDB. Either way it sounds like missing data since I've verified that both of these are on the Healthspace site and have inspections. Strange since we have quite a few vendors for Virginia Beach. I will run the scraper again today to see if it picks up anything.

ttavenner commented 10 years ago

Well, shoot. Just discovered a major flaw in the scraper design. We've still got more work to do before the database is complete.

bschoenfeld commented 10 years ago

What is it??

On Sunday, May 4, 2014, Tommy Tavenner notifications@github.com wrote:

Well, shoot. Just discovered a major flaw in the scraper design. We've still got more work to do before the database is complete.

— Reply to this email directly or view it on GitHubhttps://github.com/c4hrva/open-health-inspection-app/issues/43#issuecomment-42132056 .

ttavenner commented 10 years ago

The query system behind the Healthspace site has a hard limit of 1,000 results returned. So in places with more than 1,000 vendors, querying for 10,000 results ('Count=10000') still only returns the first 1,000. You need to iterate over the page using a 'start=' variable until the page returns 'No Results'. It should be a pretty easy fix and it should only effect a small number of cities. I am also taking it as an opportunity to update the scraper to use SmartyStreets for geocoding.

ttavenner commented 10 years ago

Updated the scraper and re-ran it specifically on VA Beach as a test. It successfully added all the vendors between P and Z. I am re-running it for the rest of the file now.

wbprice commented 10 years ago

My favorite sushi place thanks you. http://c4hrva.github.io/open-health-inspection-app/#/vendor/53664939afbfd1046cecbbfb

qwo commented 10 years ago

the app is looking sharp @wbprice, the repeat symbol took me a second to realize.