Data4Democracy / are-you-fake-news

16 stars 3 forks source link

Dockerization - Live web scraping #18

Open N2ITN opened 5 years ago

N2ITN commented 5 years ago

Status

Assigning to @N2ITN Please use this branch https://github.com/N2ITN/are-you-fake-news/tree/develop-dockerize

Issue

There are several webscraping functions that were formerly on AWS lambda that need to be moved into one service.

There are 3 different sets of functionality here. 1) spidering a news site for article URLs. 2) Scraping a single URL for article text. 3) Calling functionality 2 for a list of URLs using asyncio. Currently located in ./_scrape_lambda/code/.

Tasks

N2ITN commented 5 years ago

Back burnering this for now as replacing the existing AWS Lambda calls potentially creates more problems that it solves.

The challenges introduced by adding this functionality to the docker cluster include:

The problems that implementing this issue would solve:

The challenges could be overcome and it would still be worth having this as an option. Unless there is someone who feels passionately about working on this issue, I will pause work on it until the other parts of the app are implemented in docker-compose.