Code4HR / open-health-inspection-scraper

Scraper for the open-health-inspector app.
Apache License 2.0
7 stars 9 forks source link

Create a scraper that grabs all the health code data for the state of Virginia. #1

Closed wbprice closed 10 years ago

jalbertbowden commented 10 years ago

already exists if u want it

On Saturday, February 22, 2014, William Blaine Price < notifications@github.com> wrote:

Reply to this email directly or view it on GitHubhttps://github.com/c4hrva/open-health-inspection-scraper/issues/1 .

J. Albert Bowden II

jalbertbowden@gmail.com

http://bowdenweb.com/

qwo commented 10 years ago

tag @bschoenfeld hes working on it right now

jalbertbowden commented 10 years ago

on what? hitting the repo isn't good enough? you're there....you can't just yell across the room?

On Sat, Feb 22, 2014 at 2:21 PM, Stanley Zheng notifications@github.comwrote:

tag @bschoenfeld https://github.com/bschoenfeld hes working on it right now

Reply to this email directly or view it on GitHubhttps://github.com/c4hrva/open-health-inspection-scraper/issues/1#issuecomment-35811649 .

J. Albert Bowden II

jalbertbowden@gmail.com

http://bowdenweb.com/

bschoenfeld commented 10 years ago

I checked your repo. I see a bunch of php curl commands and html.

I'm looking to get something running everyday that takes all new data and puts it in a database. From there, we can write something that serves that data through a RESTful API.

If I missed something in your repo that can get us there faster, please let me know.

On Sat, Feb 22, 2014 at 3:05 PM, albert notifications@github.com wrote:

on what? hitting the repo isn't good enough? you're there....you can't just yell across the room?

On Sat, Feb 22, 2014 at 2:21 PM, Stanley Zheng <notifications@github.com

wrote:

tag @bschoenfeld https://github.com/bschoenfeld hes working on it right now

Reply to this email directly or view it on GitHub< https://github.com/c4hrva/open-health-inspection-scraper/issues/1#issuecomment-35811649

.

J. Albert Bowden II

jalbertbowden@gmail.com

http://bowdenweb.com/

Reply to this email directly or view it on GitHubhttps://github.com/c4hrva/open-health-inspection-scraper/issues/1#issuecomment-35813016 .

jalbertbowden commented 10 years ago

thats "there" just needs a chron to execute at chosen time(s) and a tie into a db was going to use mysql, then decided to try mongo but never figured it out scraper is there though shitty implementation to grab each letter but the data quality is wack so i could not find a solid scrape point...example, norfolk is mispelled....more so the businesses names are hit or miss...pretty unreliable

On Saturday, February 22, 2014, Ben Schoenfeld notifications@github.com wrote:

I checked your repo. I see a bunch of php curl commands and html.

I'm looking to get something running everyday that takes all new data and puts it in a database. From there, we can write something that serves that data through a RESTful API.

If I missed something in your repo that can get us there faster, please let me know.

On Sat, Feb 22, 2014 at 3:05 PM, albert notifications@github.com<javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

on what? hitting the repo isn't good enough? you're there....you can't just yell across the room?

On Sat, Feb 22, 2014 at 2:21 PM, Stanley Zheng notifications@github.com<javascript:_e(%7B%7D,'cvml','notifications@github.com');

wrote:

tag @bschoenfeld https://github.com/bschoenfeld hes working on it right now

Reply to this email directly or view it on GitHub<

https://github.com/c4hrva/open-health-inspection-scraper/issues/1#issuecomment-35811649

.

J. Albert Bowden II

jalbertbowden@gmail.comjavascript:_e(%7B%7D,'cvml','jalbertbowden@gmail.com');

http://bowdenweb.com/

Reply to this email directly or view it on GitHub< https://github.com/c4hrva/open-health-inspection-scraper/issues/1#issuecomment-35813016

.

Reply to this email directly or view it on GitHubhttps://github.com/c4hrva/open-health-inspection-scraper/issues/1#issuecomment-35813211 .

J. Albert Bowden II

jalbertbowden@gmail.com

http://bowdenweb.com/

bschoenfeld commented 10 years ago

Where do you strip stuff out of the html?

On Sat, Feb 22, 2014 at 3:19 PM, albert notifications@github.com wrote:

thats "there" just needs a chron to execute at chosen time(s) and a tie into a db was going to use mysql, then decided to try mongo but never figured it out scraper is there though shitty implementation to grab each letter but the data quality is wack so i could not find a solid scrape point...example, norfolk is mispelled....more so the businesses names are hit or miss...pretty unreliable

On Saturday, February 22, 2014, Ben Schoenfeld notifications@github.com wrote:

I checked your repo. I see a bunch of php curl commands and html.

I'm looking to get something running everyday that takes all new data and puts it in a database. From there, we can write something that serves that data through a RESTful API.

If I missed something in your repo that can get us there faster, please let me know.

On Sat, Feb 22, 2014 at 3:05 PM, albert <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');>

wrote:

on what? hitting the repo isn't good enough? you're there....you can't just yell across the room?

On Sat, Feb 22, 2014 at 2:21 PM, Stanley Zheng < notifications@github.com<javascript:_e(%7B%7D,'cvml',' notifications@github.com');>

wrote:

tag @bschoenfeld https://github.com/bschoenfeld hes working on it right now

Reply to this email directly or view it on GitHub<

https://github.com/c4hrva/open-health-inspection-scraper/issues/1#issuecomment-35811649

.

J. Albert Bowden II

jalbertbowden@gmail.com<javascript:_e(%7B%7D,'cvml',' jalbertbowden@gmail.com');>

http://bowdenweb.com/

Reply to this email directly or view it on GitHub<

https://github.com/c4hrva/open-health-inspection-scraper/issues/1#issuecomment-35813016

.

Reply to this email directly or view it on GitHub< https://github.com/c4hrva/open-health-inspection-scraper/issues/1#issuecomment-35813211

.

J. Albert Bowden II

jalbertbowden@gmail.com

http://bowdenweb.com/

Reply to this email directly or view it on GitHubhttps://github.com/c4hrva/open-health-inspection-scraper/issues/1#issuecomment-35813445 .

jalbertbowden commented 10 years ago

never got that far...that was part of my problem...couldn't get markup separated to save, was only saving total

dude this code totally sux....i just saw the scraper creation header come in email and wanted to help....we both know u can do a much better implementation

On Saturday, February 22, 2014, Ben Schoenfeld notifications@github.com wrote:

Where do you strip stuff out of the html?

On Sat, Feb 22, 2014 at 3:19 PM, albert notifications@github.com<javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

thats "there" just needs a chron to execute at chosen time(s) and a tie into a db was going to use mysql, then decided to try mongo but never figured it out scraper is there though shitty implementation to grab each letter but the data quality is wack so i could not find a solid scrape point...example, norfolk is mispelled....more so the businesses names are hit or miss...pretty unreliable

On Saturday, February 22, 2014, Ben Schoenfeld notifications@github.com<javascript:_e(%7B%7D,'cvml','notifications@github.com');

wrote:

I checked your repo. I see a bunch of php curl commands and html.

I'm looking to get something running everyday that takes all new data and puts it in a database. From there, we can write something that serves that data through a RESTful API.

If I missed something in your repo that can get us there faster, please let me know.

On Sat, Feb 22, 2014 at 3:05 PM, albert notifications@github.com<javascript:_e(%7B%7D,'cvml','notifications@github.com'); javascript:_e(%7B%7D,'cvml','notifications@github.com<javascript:_e(%7B%7D,'cvml','notifications@github.com'); ');>>

wrote:

on what? hitting the repo isn't good enough? you're there....you can't just yell across the room?

On Sat, Feb 22, 2014 at 2:21 PM, Stanley Zheng < notifications@github.comjavascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml',' notifications@github.comjavascript:_e(%7B%7D,'cvml','notifications@github.com'); ');>

wrote:

tag @bschoenfeld https://github.com/bschoenfeld hes working on it right now

Reply to this email directly or view it on GitHub<

https://github.com/c4hrva/open-health-inspection-scraper/issues/1#issuecomment-35811649

.

J. Albert Bowden II

jalbertbowden@gmail.comjavascript:_e(%7B%7D,'cvml','jalbertbowden@gmail.com'); <javascript:_e(%7B%7D,'cvml',' jalbertbowden@gmail.comjavascript:_e(%7B%7D,'cvml','jalbertbowden@gmail.com'); ');>

http://bowdenweb.com/

Reply to this email directly or view it on GitHub<

https://github.com/c4hrva/open-health-inspection-scraper/issues/1#issuecomment-35813016

.

Reply to this email directly or view it on GitHub<

https://github.com/c4hrva/open-health-inspection-scraper/issues/1#issuecomment-35813211

.

J. Albert Bowden II

jalbertbowden@gmail.comjavascript:_e(%7B%7D,'cvml','jalbertbowden@gmail.com');

http://bowdenweb.com/

Reply to this email directly or view it on GitHub< https://github.com/c4hrva/open-health-inspection-scraper/issues/1#issuecomment-35813445

.

Reply to this email directly or view it on GitHubhttps://github.com/c4hrva/open-health-inspection-scraper/issues/1#issuecomment-35813500 .

J. Albert Bowden II

jalbertbowden@gmail.com

http://bowdenweb.com/

bschoenfeld commented 10 years ago

No problem. I've got a simple python thing kinda working. We can keep working on it!

On Sat, Feb 22, 2014 at 3:25 PM, albert notifications@github.com wrote:

never got that far...that was part of my problem...couldn't get markup separated to save, was only saving total

dude this code totally sux....i just saw the scraper creation header come in email and wanted to help....we both know u can do a much better implementation

On Saturday, February 22, 2014, Ben Schoenfeld notifications@github.com wrote:

Where do you strip stuff out of the html?

On Sat, Feb 22, 2014 at 3:19 PM, albert <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

thats "there" just needs a chron to execute at chosen time(s) and a tie into a db was going to use mysql, then decided to try mongo but never figured it out scraper is there though shitty implementation to grab each letter but the data quality is wack so i could not find a solid scrape point...example, norfolk is mispelled....more so the businesses names are hit or miss...pretty unreliable

On Saturday, February 22, 2014, Ben Schoenfeld < notifications@github.com<javascript:_e(%7B%7D,'cvml',' notifications@github.com');>

wrote:

I checked your repo. I see a bunch of php curl commands and html.

I'm looking to get something running everyday that takes all new data and puts it in a database. From there, we can write something that serves that data through a RESTful API.

If I missed something in your repo that can get us there faster, please let me know.

On Sat, Feb 22, 2014 at 3:05 PM, albert <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');

');>>

wrote:

on what? hitting the repo isn't good enough? you're there....you can't just yell across the room?

On Sat, Feb 22, 2014 at 2:21 PM, Stanley Zheng < notifications@github.com<javascript:_e(%7B%7D,'cvml',' notifications@github.com');> <javascript:_e(%7B%7D,'cvml',' notifications@github.com<javascript:_e(%7B%7D,'cvml',' notifications@github.com');> ');>

wrote:

tag @bschoenfeld https://github.com/bschoenfeld hes working on it right now

Reply to this email directly or view it on GitHub<

https://github.com/c4hrva/open-health-inspection-scraper/issues/1#issuecomment-35811649

.

J. Albert Bowden II

jalbertbowden@gmail.com<javascript:_e(%7B%7D,'cvml',' jalbertbowden@gmail.com');> <javascript:_e(%7B%7D,'cvml',' jalbertbowden@gmail.com<javascript:_e(%7B%7D,'cvml',' jalbertbowden@gmail.com');> ');>

http://bowdenweb.com/

Reply to this email directly or view it on GitHub<

https://github.com/c4hrva/open-health-inspection-scraper/issues/1#issuecomment-35813016

.

Reply to this email directly or view it on GitHub<

https://github.com/c4hrva/open-health-inspection-scraper/issues/1#issuecomment-35813211

.

J. Albert Bowden II

jalbertbowden@gmail.com<javascript:_e(%7B%7D,'cvml',' jalbertbowden@gmail.com');>

http://bowdenweb.com/

Reply to this email directly or view it on GitHub<

https://github.com/c4hrva/open-health-inspection-scraper/issues/1#issuecomment-35813445

.

Reply to this email directly or view it on GitHub< https://github.com/c4hrva/open-health-inspection-scraper/issues/1#issuecomment-35813500

.

J. Albert Bowden II

jalbertbowden@gmail.com

http://bowdenweb.com/

Reply to this email directly or view it on GitHubhttps://github.com/c4hrva/open-health-inspection-scraper/issues/1#issuecomment-35813602 .

bschoenfeld commented 10 years ago

MVP in 5ecc7d3a9e0e07367d09dbf5c134d653b3c72b0d