MrDiggles2 / cru-scrape

Scraper of CRU sites
0 stars 0 forks source link

Queue and workers #19

Closed MrDiggles2 closed 3 weeks ago

MrDiggles2 commented 1 month ago

Figure out how to add a queue of sites+year to scrape

Options:

  1. Use redis and rq

    • Host on chan-ls.com
    • Create Dockerized worker so we can run it anywhere (ec2, raspberry-pi, docker container to run in the background)
    • Need monitoring/dashboard - maybe https://github.com/Parallels/rq-dashboard
  2. Use AWS SQS

    • Still need a dockerized worker, can't run on Lambda due to 15 min max execution
    • Use AWS provided dashboards

Either way, the payload on the queue should be tuple of (YEAR, SITE_ID)