When communicating a complex or unfamiliar concept, jargon creates a barrier for understanding. There are a number of situations when we are asked to give an explanation assuming that the audience has no technical or specialised knowledge. This can be particularly challenging as experts are often unaware when they are using jargon or what even constitutes jargon. We have developed a tool that identifies jargon, gives the user a metric to rate the jargon content, and makes suggestions for alternatives.
It improves upon existing tools in this space in many ways:
This software is designed to take some text (in US English) and calculate the proportion of commonly used words. A score of 0% means all of the words are jargon, a score of 100% means none of the words are jargon. Proper nouns, single characters, abbreviations and numbers are excluded from the calculation. Words are also reduced to their stem (i.e. plurals are singularised; the past/future tense is transformed to present tense) to reduce the false positive rate.
The software can calculate multiple metrics where jargon is classified in several ways.
The first of these is implemented already; others will be added (see issues list on GitHub).
Issue #30 - Flag other charcteristics associated with a high-level comprehension including:
Issue #14 - Check for it being English, spelling and grammar before analysing.
Issue #31 - Return word cloud with jargon and common words coloured differently.
Issue #13 - Take sound files/clips, transcibe them into text and analyse.
TODO: Some instructions for users wanting just to run the system locally. Link to a hosted version.
It is best practice to work on Python projects within a virtual environment,
to avoid conflicts with your main system installation. The virtualenv
tool
can be installed following the instructions at
https://virtualenv.pypa.io/en/stable/installation/
Clone this Git repository, then navigate to the folder where you cloned it in a terminal and run the following sequence of commands to set up a virtual environment and install all the project's dependencies.
(Note for Windows users: these assume a POSIX-style shell, so will work in
git-bash, but not the standard Windows shell. For that, you'll probably need
python3.exe
in place of python3
, and venv\Scripts\activate
as the
second line.)
virtualenv -p python3 venv
source venv/bin/activate
pip install -r requirements.txt
pip install -r test_requirements.txt # For testing
You can then install the project package itself in 'developer mode', so that changes made to files in your working copy are reflected in the installed package too:
pip install -e .
Installing with conda in its own environment.
# create conda environment based on environment.yml file
conda env create
# activate the environment
source activate JargonProfiler
Testing is being done using pytest. To run all the tests, just use
pytest test
New tests should be written in files inside the test
folder, named either
test_*.py
or *_test.py
. The tests themselves are functions with names
starting test_
and taking no arguments. They check expected behaviour using
assert
statements.
See the pytest documentation for more details.
The requirements.txt
file used above is generated from a specification in
requirements.in
by pip-tools. This ensures that we list the
exact versions used of all our dependencies, including indirect ones. If you
are adding a new dependency, add it to requirements.in
and then run
pip install pip-tools # First time only!
pip-compile
To upgrade dependencies to their latest versions use
pip-compile --upgrade
bower install
python runserver.py