Code4HR / open-health-inspection-scraper

Scraper for the open-health-inspector app.
Apache License 2.0
7 stars 9 forks source link

Better locality and city naming #24

Closed bschoenfeld closed 10 years ago

bschoenfeld commented 10 years ago

I noticed that there were no restaurants in Blacksburg in the app. When I dug into it, I found that we were scrapping locality names complete wrong. Once I fixed that, I also found that city names weren't always being scrapped correctly either. I think I fixed those two things, as well as another bug where Mongo was allow NULL ids (maybe that was just me?).

I'd love someone else to verify these changes before the pull is accepted. This will probably require the DB to be dropped a regenerated too.

ttavenner commented 10 years ago

Will test this shortly and merge if everything looks good. I have a local instance of the MongoDB I can test it out on. Assuming everything looks good I will generate a new DB in place of the scheduled scraper run next week and then make the switch over.

bschoenfeld commented 10 years ago

Dirty fix for #24, should reference #23 instead.

bschoenfeld commented 10 years ago

Hold off. Got this error when I ran for an extended time

pymongo.errors.OperationFailure: cursor id '2189918118744092517' not valid at server
bschoenfeld commented 10 years ago

I'm pretty happy with this now. What do you think @ttavenner?

ttavenner commented 10 years ago

I was trying to test it last night but ran into an error that I think might have had more to do with my local MongoDB instance than with the scraper. I will reset that collection and try to test it again tonight or tomorrow and hopefully get this merged.

ttavenner commented 10 years ago

@bschoenfeld I'm running the scraper now and I really like the improvements you've made as well as the general bug fixing. In particular the insertion date and the % complete. Nice touches.