TheLeagueOfGentlemen / hackathon-skeleton

2 stars 0 forks source link

Scrape data from Dig in VT for selected categories #2

Open benglass opened 10 years ago

benglass commented 10 years ago

Looks like one approach we might take to scraping the data is to do a search for a category, on the results page if you add all of the results to "My Places" you can then generate a URL that has a text list of the results. For example this is a saved "my places" where i added all the breweries and is a lot more scrapable.

http://www.diginvt.com/my-places/?k=55d21e1d7cc1dcf94c6bcd3a35901353

@briancappello @eborden @evanrbriggs @meddy

benglass commented 10 years ago

Additionally they have this concept of "trails" which are basically collections of things to do in an area and its in a more scrapable format, we could scrape them and turn them into "achievements" and then modify them

http://www.diginvt.com/trails/

benglass commented 10 years ago

This data does not include natural stuff like hiking so we may need to find a separate data source for that.

briancappello commented 10 years ago

Places and events are (I think) working. Data is in the data/diginvt folder of branch unlock The json data currently has the following format, but I'm of course open to suggestions for improvement:

{ 'events' => { 'urls' => {'name', 'description', 'date', 'times',
                                  'address', 'town', 'state', 'zipcode',
                                  'phone', 'categories', 'website'},
  'places' => { 'urls' => {'name', 'description', 'hours',
                                  'address', 'town', 'state', 'zipcode',
                                  'phone', 'email', 'website', 'categories', 'seasons'}
}

As the the trails data, I'll get working on that now.

benglass commented 10 years ago

Would be good to work on geocoding the place and events data. I can take a stab at this tonight

benglass commented 10 years ago

@briancappello @eborden @evanrbriggs @meddy

I added a script to geocode the places data. I didn't geocode the events data as I'm not sure we're using it but it would be trivial to have the script do that as well. Geocoded dataset from digin is at https://github.com/TheLeagueOfGentlemen/hackathon-skeleton/blob/unlock/data/diginvt/parsed_data_geocoded.json

There was only one geocoding failure 1428 Millbrook Road, Waitsfield, VT 05673