IndivisibleOnline / Indivisible-Plugin

The Custom Plugin that Powers IW Wordpress Site
0 stars 2 forks source link

Automated events scraper #23

Open hagbardc opened 7 years ago

hagbardc commented 7 years ago

Events: Automated Events scraper for Westchester-level events and actions

Needs information: more details required

imlindaryan commented 7 years ago

This is an issue for Tom Pierce and I don't see him on here. Tasking self to contact him and @kennybatf to make sure he gets an invite / ID for GitHub

hagbardc commented 7 years ago

Acknowledged - I reached out on Slack a while back for his github login, but he didn't get back. I'll follow up.

imlindaryan commented 7 years ago

@tom-iw welcome to the world of IW GitHub! What is the status of this and how can we help? Last updates I have are: "I am still unable to upload to iw-stage. i'm trying to add another calendar or two, but i'm not sure if they'll import properly. i'm assuming i should create a group repo on github for the calendar scrapers (rather than keeping it under my account). if there are any strong feelings about how i do this (make a scraper repo, make a scala repo, don't name it "Hansel", etc), let me know soon!"

tom-iw commented 7 years ago

the code is posted in my personal repo and i'll pop it into the group repo later this week or over the weekend. the county calendar scrapers should be "deployable", and i have scrapers to pull the mamaroneck and yonkas calendars that i need to test and check in.

the question is how to automate this. the calendaring plugin we bought for the website will poll a published iCal calendar (this is just an ICS file served over HTTP). the scrapers generate these files.

i can start running this periodically on a machine/website that i control (but i cannot grant access to others). or we can get it set up on the AWS box or something so you are not relying on flakytom to kick the box if something goes wrong. i think it would be ideal if i helped someone else set it up in AWS so i am not a single point of failure, but if it makes more sense for me to just get it going for now, i am happy to do that.

tom-iw commented 7 years ago

this is just copypasta from the slack channel - a sketch of deployment instructions

hi folks - finally have some code we can try running on the server. seems to create iCal files that import without creating dupes! https://github.com/tom-iw/IW

simplest way to get up and going is to install SBT (can do it by downloading+unfurling a tar/zip and putting the "sbt" command in your path, or by installing a deb/RPM): http://www.scala-sbt.org/0.13/docs/Installing-sbt-on-Linux.html

then run: sbt "run-main org.indivisiblewestchester.county.LegCalScraper /tmp/legcal.ics"

to scrape the legistlative calendar into output file /tmp/legcal.ics - note that everything after sbt is wrapped in one set of quotes. similarly: sbt "run-main org.indivisiblewestchester.county.CntyCalScraper /tmp/cntycal.ics"

will pull the county calendar (but it will be heavily filtered to remove basket weaving, etc)

you can see example imports on stage: http://iw-stage.indivisiblewestchester.org/events/ - converting back to ical format for the imports lost us some snazz (map integration for one), but i'll happily take that to be done with the dupes

forgot to mention that you must be in the "iw" dir when you run the sbt "run-main" commands- so to cron this it will probably be best to have a shell script that does something like: PATH=$PATH:/path/to/sbt cd $GIT_CHECKOUT/iw sbt "run-main...."

oh and one more thing - i swear i think this is it - i think it would make things easiest if we had the output files drop into a place where they'd be served via http. then we can set up the calendar plugin to subscribe to the calendars.