Closed wbprice closed 10 years ago
Are you finding a listing for the restaurant with no inspection data specifically or just no listing for the restaurant at all? I'm having trouble finding these restaurants at all in the MongoDB. Either way it sounds like missing data since I've verified that both of these are on the Healthspace site and have inspections. Strange since we have quite a few vendors for Virginia Beach. I will run the scraper again today to see if it picks up anything.
Well, shoot. Just discovered a major flaw in the scraper design. We've still got more work to do before the database is complete.
What is it??
On Sunday, May 4, 2014, Tommy Tavenner notifications@github.com wrote:
Well, shoot. Just discovered a major flaw in the scraper design. We've still got more work to do before the database is complete.
— Reply to this email directly or view it on GitHubhttps://github.com/c4hrva/open-health-inspection-app/issues/43#issuecomment-42132056 .
The query system behind the Healthspace site has a hard limit of 1,000 results returned. So in places with more than 1,000 vendors, querying for 10,000 results ('Count=10000') still only returns the first 1,000. You need to iterate over the page using a 'start=' variable until the page returns 'No Results'. It should be a pretty easy fix and it should only effect a small number of cities. I am also taking it as an opportunity to update the scraper to use SmartyStreets for geocoding.
Updated the scraper and re-ran it specifically on VA Beach as a test. It successfully added all the vendors between P and Z. I am re-running it for the rest of the file now.
My favorite sushi place thanks you. http://c4hrva.github.io/open-health-inspection-app/#/vendor/53664939afbfd1046cecbbfb
the app is looking sharp @wbprice, the repeat symbol took me a second to realize.
I've noticed a couple of instances where there isn't inspection data in the database for specific restaurants. As examples, I don't see anything for Pho 79 and This Old House where I live.
I wondering if this is an instance of that information just not being in the database or some other problem.