Closed hadoopjax closed 7 years ago
Potentially silly question: why not update the crontab in the Dockerfile? Something like
CMD (crontab -l ; echo "0 * * * * my_scraper_here") | sort - | uniq - | crontab -
(inspired by this SO post, which also ensures idempotency)
@acompa, because crond isn't running inside the container, changing the crontab won't help.
I'm busy this afternoon, but if nobody has done this by this evening, I'll take a stab at it.
Hey @WanderingStar if you think this is something you can tackle do you want to go ahead and assign the issue to yourself? My thinking was basically this but would be absolutely thrilled to have another person dig in :) we need all the hands we can get working on this stuff.
I can't assign the issue to myself (because I'm not a contributor?) but I just created a pull request.
Bah, ok. Running checks now!!
@hadoopjax Note: the crontab in the PR just runs test.py, not the actual script. I didn't set up the AWS infrastructure to let me run the script.
@WanderingStar no prob; changed around the auths (added twitter/aws) and pointed crontab to the relevant files and it worked perfectly. Thanks again for your help; hopefully we can talk you into doing some more stuff :)
@hadoopjax Sure. I'm only a dabbler in data science, but I know a thing or two about software/systems.
@WanderingStar Thanks for the explanation! I figured the question would be silly. :) Happy to have a knowledgeable engineer on-board!
Having recently "Dockerized" the Twitter collection process, one piece we're missing is the ability to run the Twitter collection from within the Docker container on a schedule (i.e. using cron). The OS is Alpine Linux, and here's a good example of what we're trying to do. Check out the branch here.
Please assign this to yourself if you are working on it so we're not stepping all over each other :)