The webserver_get.py module is both central to the web app. It encapsulates the backend of a user request. Right now it's a tangled mess.
Right now it performs all of these functions:
Verifying a URL
Crawling the target website for the URLs of 100 articles
Sending the collected URLs to the web scraping Lambda function asynchronously
Concatenating the text return by the Lambdas
Sending the text to the deep learning classifier Lambda
Calling the plotter.py module which renders the results.
This is way too much for one module to cover and still be maintainable. The goal here is to break out the discrete functionality of this so that each portion can exist on its own AWS Lambda function. The deployment of the Lambdas will be a different issue.
Tasks
Split code accordingly. Most of the work is moving part of it to article_collector
Here is the suggested execution order, with the idea that webserver_get mostly calls other functions but doesn't do much on its own.
webserver_get.py: Sits on EC2 as before. Verifies URL, sends it to article_collector
article_collector.py: Future Lambda function. Crawls target website and sends URLs to web scraping Lambdas, returns concatenated articles text results.
'webserver_get.py: Receives text results, checks validity, relays them to the deep learning classifier Lambda. Sends those results toplotter.py`
plotter.py Future Lambda function. Makes plots, sends them to S3. (Right now this module sits on the web server and saves plot images locally)
Status
Assigning to myself - help welcome!
Issue
The
webserver_get.py
module is both central to the web app. It encapsulates the backend of a user request. Right now it's a tangled mess.Right now it performs all of these functions:
plotter.py
module which renders the results.This is way too much for one module to cover and still be maintainable. The goal here is to break out the discrete functionality of this so that each portion can exist on its own AWS Lambda function. The deployment of the Lambdas will be a different issue.
Tasks
Split code accordingly. Most of the work is moving part of it to
article_collector
Here is the suggested execution order, with the idea thatwebserver_get
mostly calls other functions but doesn't do much on its own.webserver_get.py
: Sits on EC2 as before. Verifies URL, sends it toarticle_collector
article_collector.py
: Future Lambda function. Crawls target website and sends URLs to web scraping Lambdas, returns concatenated articles text results.: Receives text results, checks validity, relays them to the deep learning classifier Lambda. Sends those results to
plotter.py`plotter.py
Future Lambda function. Makes plots, sends them to S3. (Right now this module sits on the web server and saves plot images locally)Refactor code to be clearer if/when possible.