ewulczyn / wiki-detox

See https://meta.wikimedia.org/wiki/Research:Modeling_Talk_Page_Abuse
Other
150 stars 48 forks source link

Wikipedia Detox

The repository is part of the Wikipedia Detox Research project. See the getting started guide to build your own models and run your own experiments.

This repository hold the codebase associated with the paper Ex Machina: Personal Attacks Seen at Scale by Ellery Wulczyn, Nithum Thain, Lucas Dixon, published in Feb 2017 and presented at WWW-2017.

More recent development is now happening in the repositories of https://conversationai.github.io/

Setup using python virtual env

Assumes you have python/pip installed and setup.

Setup your ptyhon virtual env (assumes python 3.5)

# Setup a new python virtual env for this project; only needs to be done once
# per setup
virtualenv -p python3.5 tmp/env
source tmp/env/bin/activate
pip3 install -r requirements.txt

Test it works:

# Enter you python virtual environment
source tmp/env/bin/activate
echo '
import tensorflow as tf
hello = tf.constant("Hello, TensorFlow!")
sess = tf.Session()
print(sess.run(hello))
' | python

Which should output:

b'Hello, TensorFlow!'

Setup datasets and train models from Figshare data

Assumes you have setup your python virtual environment.

# Enter the python virtual env
source tmp/env/bin/activate
# Create the local datasets and models directories.
mkdir -p tmp/datasets && mkdir -p tmp/models
# Download datasets and train models
python src/modeling/get_prod_models.py --task recipient_attack \
  --data_dir tmp/datasets --model_dir ${PWD}/tmp/models
python src/modeling/get_prod_models.py --task attack \
  --data_dir tmp/datasets --model_dir ${PWD}/tmp/models
python src/modeling/get_prod_models.py --task aggression \
  --data_dir tmp/datasets --model_dir tmp/models
python src/modeling/get_prod_models.py --task aggression \
  --data_dir tmp/datasets --model_dir tmp/models
ln -s ./tmp/models ./models

Start a jupyter notebook

# Enter the python virtual env
source tmp/env/bin/activate
# Start jupyter
jupyter notebook