MediaUncovered / NewsAnalysis

use word embeddings to uncover bias in newspapers
5 stars 1 forks source link

NewsAnalysis

Trains, evaluates and analyses newspaper word embeddings.

Run script and set variables

The run.py file contains the required steps to build, evaluate and analyse a word embedding model from a database. The parameters, e.g. to access the database or train the model, are set as environment variables. To change the number of documents that are used to train a word embedding model set

export NO_DOCS=42

Then run the script:

python3 run.py

Install dependencies

To install the dependencies make sure you have Python 3.5 and pip installed. Clone or download the repository.

Upgrade pip:

pip install -U pip

Install the dependencies with:

pip install -r requirements.txt

Add models and data directory:

mkdir models

Word Embeddings

Visualisation

For the visualisation of the word embedding model the standalone version of the tensorflow embedding projector is used. Clone the git repository and initiate newsAnalysis/Projector.py with the relative path to this repository. Model.visualise() automatically loads a trained model to the browser and allows users to explore its words and their relations.

Compatibility test

To ensure that your work also runs as you intended on other machines, please run an acceptance test.

To do this first copy the docker-compose.yml.example file and name it docker-compose.yml. Next fill in the environment variables with your values.

Now you can build and run the acceptance test.

docker-compose build
docker-compose up

This will create a docker container that will install all the requirements from requirements.txt and runs the newsAnalysis.run.py file. The generated data will be stored in the ./data dir that is created by this process.

Unittests

Unittests are automatically run in the docker container. For seperate testing nose can be used:

nosetests unittests/

Packaging

To package newsanalysis into a wheel run python setup.py bdist bdist_wheel. The wheel file will then be saved under dist/newsanalysis-<VERSION>-py3-none-any.whl.

You can then install the package in your other environments using pip install <PATH-TO-PACKAGE>