The backend service for The Sentimentalists article analysis service.
The source code was developed in PYTHON.
The APP was then built and deployed on AWS Lambda.
This service is automatically built and deployed to AWS when code is merged to master. Check out the workflow in .github/workflows/deploy.yml for the steps. It's is built with terraform modules inherited from our shared infrastructure repo.
To build locally, see INSTALL.md.
The SENTIMENTALISTSAPP-BACKEND is divided into the following folders:
infra \ prod
Contains the Terraform files:
src
Contains the source code of our Python modules.
download_punkt.py
Downloads NLTK PUNKT, which is used by TextBlob library, in order to reduce the size of the package passed in the automation to AWS Lambda.
get_bias_score.py
Calculates the TRUST SCORE, based on the credibility, polarity, subjectivity values.
get_credibility_score.py
Calls the API Gate Source Credibility passing the URL. Returns the URL Credibility Score, Category (Left Center, Fake News, ... and the Source which rated the website (Media Bias / Fact Check, etc)..py.
get_secret.py
Calls AWS Secret Manager and returns the requested secret as a dict of key/value pairs.
get_text.py
Calls the Python library "Newspaper", which retrieves the text (article) from an URL.
Returns the article TEXT, HEADER, SUMMARY, KEYWORDS and TOP_IMAGE of the news article.
lambda_function.py
Main module of our backend app. Firstly it validates the URL, then it calls the following Python modules:
1) get_credibility_score.py
2) sentiment_analysis.py
3) get_bias_score.py
4) spacy_matcher.py
Each of these modules returns results that will populate our JSON file, which will be sent to the frontend via AWS Lambda.
sentiment_analysis.py
Reads an URL, then calls the function "getText" to convert the HTML text into an unformatted text. Then it calls the Python
Library TextBlob, which analyses the "sentiment" of the text. It finally returns the polarity and subjectivity of the whole text.
spacy_matcher.py
Calls the Python Library Spacy with a TEXT to be analysed.
The output of this function is a list with dictionary pairs: {'type' : tag, 'topic' : obj}.
The tags that can be returned by the spaCy library are:
PERSON - People, including fictional.
ORG - Companies, agencies, institutions, etc.
GPE - Countries, cities, states.
PERCENT - Percentage, including ”%“.
LANGUAGE - Any named language.
DATE - Absolute or relative dates or periods.
TIME - Times smaller than a day.
LOC - Non-GPE locations, mountain ranges, bodies of water.
NORP - Nationalities or religious or political groups.
EVENT - Named hurricanes, battles, wars, sports events, etc.
WORK_OF_ART - Titles of books, songs, etc.
MONEY - Monetary values, including unit.
QUANTITY - Measurements, as of weight or distance.
ORDINAL - “first”, “second”, etc.
CARDINAL - Numerals that do not fall under another type (not ordinal, quantity ..)
PS: Our APP Frontend is currently Using the following spaCy tags: PERSON, ORG, GPE, EVENT and WORK_OF_ART.
The following files are used in the automation, installing objects, compressing / deleting them or pointing to the Python libraries that must be installed:
tests
Contains the Python modules used to run the tests (PYTEST library).
We are currently running 35 tests, as shown below:
INSTALL.md (file)
The file INSTALL.md contains commands used to create the local anaconda environment, as well as settings used to enable the PYTEST execution and important environment variables locally set.
SCOPE.md (file)
The file SCOPE.md has a list of the libraries and APIs used in the backend code. It also has a list of ideas that can be implemented in future MVPs.