Data4Democracy / are-you-fake-news

16 stars 3 forks source link

Clean up `webserver_get.py` #3

Closed N2ITN closed 6 years ago

N2ITN commented 6 years ago

Status

Assigning to myself - help welcome!

Issue

The webserver_get.py module is both central to the web app. It encapsulates the backend of a user request. Right now it's a tangled mess.

Right now it performs all of these functions:

This is way too much for one module to cover and still be maintainable. The goal here is to break out the discrete functionality of this so that each portion can exist on its own AWS Lambda function. The deployment of the Lambdas will be a different issue.

Tasks

Split code accordingly. Most of the work is moving part of it to article_collector Here is the suggested execution order, with the idea that webserver_get mostly calls other functions but doesn't do much on its own.

  1. webserver_get.py: Sits on EC2 as before. Verifies URL, sends it to article_collector
  2. article_collector.py: Future Lambda function. Crawls target website and sends URLs to web scraping Lambdas, returns concatenated articles text results.
  3. 'webserver_get.py: Receives text results, checks validity, relays them to the deep learning classifier Lambda. Sends those results toplotter.py`
  4. plotter.py Future Lambda function. Makes plots, sends them to S3. (Right now this module sits on the web server and saves plot images locally)

Refactor code to be clearer if/when possible.

N2ITN commented 6 years ago

Fixed.