Data4Democracy / discursive

Twitter topic search and indexing with Elasticsearch
21 stars 11 forks source link

Setup cron ability in Docker container: discursive-graph-data branch #25

Closed hadoopjax closed 7 years ago

hadoopjax commented 7 years ago

Having recently "Dockerized" the Twitter collection process, one piece we're missing is the ability to run the Twitter collection from within the Docker container on a schedule (i.e. using cron). The OS is Alpine Linux, and here's a good example of what we're trying to do. Check out the branch here.

Please assign this to yourself if you are working on it so we're not stepping all over each other :)

acompa commented 7 years ago

Potentially silly question: why not update the crontab in the Dockerfile? Something like

CMD (crontab -l ; echo "0 * * * * my_scraper_here") | sort - | uniq - | crontab -

(inspired by this SO post, which also ensures idempotency)

WanderingStar commented 7 years ago

@acompa, because crond isn't running inside the container, changing the crontab won't help.

I'm busy this afternoon, but if nobody has done this by this evening, I'll take a stab at it.

hadoopjax commented 7 years ago

Hey @WanderingStar if you think this is something you can tackle do you want to go ahead and assign the issue to yourself? My thinking was basically this but would be absolutely thrilled to have another person dig in :) we need all the hands we can get working on this stuff.

WanderingStar commented 7 years ago

I can't assign the issue to myself (because I'm not a contributor?) but I just created a pull request.

hadoopjax commented 7 years ago

Bah, ok. Running checks now!!

WanderingStar commented 7 years ago

@hadoopjax Note: the crontab in the PR just runs test.py, not the actual script. I didn't set up the AWS infrastructure to let me run the script.

hadoopjax commented 7 years ago

@WanderingStar no prob; changed around the auths (added twitter/aws) and pointed crontab to the relevant files and it worked perfectly. Thanks again for your help; hopefully we can talk you into doing some more stuff :)

WanderingStar commented 7 years ago

@hadoopjax Sure. I'm only a dabbler in data science, but I know a thing or two about software/systems.

acompa commented 7 years ago

@WanderingStar Thanks for the explanation! I figured the question would be silly. :) Happy to have a knowledgeable engineer on-board!