adrian-pace / FROG-analytics

Metrics and visualizations on the behaviour of users in various online editors
1 stars 3 forks source link

# FROG-analytics

Analytics algorithms and visualizations for collaborative writings. It currently handle three collaborative document editors, Etherpad, Collab-react-components and FROG

Setup

Python Environment

You will need Python3 and the following dependencies.

pip install csv
pip install ast
pip install numpy
pip install argparse
pip install pymongo
pip install matplotlib
pip install pandas
pip install seaborn
pip install sqlite3
pip install flask

Etherpad

You'll need to download etherpad from http://etherpad.org/. Extract it in a folder named etherpad.

You can start etherpad by runnning start.bat.
Then connect to http://localhost:9001.

Note: If you want to join the pad on another terminal, replace localhost by the IP address of the host. Users collaborating need to be on the same network or remotely connect to the host.

Start Editing.

All edits are stored in etherpad/var/dirty.db.

(changeset format: http://policypad.readthedocs.io/en/latest/changesets.html)

Collab-react-components

In order to use the text collab editor written by Dario Anongba. First install pymango by using:

pip install pymongo

Clone the program from Dario: Collab-React-Components in any folder (doesn't have to be in the same folder as the repo).
Install node.js/npm
Install react with npm (version 15.4.0+) in Dario's folder

git clone https://github.com/chili-epfl/collab-react-components
npm install --save react
npm install --save react-dom

npm init and npm install in root folder and in the demos folder

npm init
npm install
cd demos/collab-editor  
npmn init
npm install

Replace server.js with the corresponding file in this repo so that the edits are stored in the mongo database. Then start the program with npm start

service mongod start
cd demos/collab-editor
npm start

To access the edits, the program will use pymongo.
To delete the logs, you need to run the python script drop mongo database.py

python "drop mongo dabase.py"

Find a detailed tutorial on MongoDB (op format https://github.com/ottypes/text)

Usage

In its current state, the program groups the fine grained writing events (ElementaryOperation) provided by the editors into Operations (list of writing events by the same author in a short time lapse and at the same position). Then Paragraphs are deduced which are the collection of operations in the same line. This allows us to compute metrics from the context of an operation:

Command line execution

We have provided a small command line interface to interact with the program.

python analytics.py

With the following arguments:

usage: analytics.py [-h] [-p PATH_TO_DB] [-e {etherpad,stian_logs,collab-react-components}] [-t] [-viz] [-v] [subset_of_pads SUBSET_OF_PADS | --specific_pad SPECIFIC_PAD]

Run the analytics.

optional arguments:
  -h, --help            show this help message and exit
  -p PATH_TO_DB, --path_to_db PATH_TO_DB
                        path to database storing the edits
  -e, --editor {etherpad,stian_logs,collab-react-components}
                        What editor are the logs from
  -t, --texts           Print the texts colored by ops and by author
  -viz, --visualization
                        Display the visualization (proportion of participation
                        of the users per pad/paragraph, how synchronous they
                        are...
  -v, --verbosity       increase output verbosity (you can put -v or -vv)
  --subset_of_pads SUBSET_OF_PADS
                        Size of the subset of pads we will process
  --specific_pad SPECIFIC_PAD
                        Process only one pad

Below are a few examples of execution.

Examples of execution

Etherpad

To launch the program on data collected with the Etherpad editor. You can launch the following command:

python analytics.py -p "etherpad/dirty.db" -e etherpad -t -viz -v --specific_pad "First Pad"

This will extract the logs in etherpad/dirty.db and provide a few insights on the pad "First Pad". You could also remove --specific_pad "First Pad" so that you display the insights for all pads in the logs. The -t argument will print the text colored by authors and by operations. The -viz argument will display the various visualizations explained later.

Note : it is possible that your terminal doesn't handle colors. In this case, you will need to run it in a python console or remove the -t argument.

Collab-React-Components

If you would like to run the program on all the pads of the Collab-React-Components editor, then you can run the following command

python analytics.py -e collab-react-components -t -viz -v

Note: there is no need to specify the path of the logs since they are store in the mongo database.

Stian logs

Finally if you would like to run the program on a subset of the pads from the logs of Stian, you can run the following command (We have a different editor since the logs are not stored in the exact same way as etherpad stores them).

python analytics.py -p "stian logs/store.csv" -e stian_logs -t -viz -v --subset_of_pads 10

This will display the visualizations and colored texts for the first 10 pads.

Live analytics for Etherpad and collab-react-components

Live analytics updates the metrics, as soon as there is a change in the document. You can run it with:

python live_analytics.py

Note: live_analytics.py doesn't take any command line arguments. We suggest you configure which editor you are using in config.py !

Live analytics for FROG

Live analytics for FROG receives by HTTP/POST in a json the pad_names it wants to follow. It can also send a regex. Then every few seconds (the refresh rate is configurable in the config.py file), the program sends a HTTP/POST to a listening server (adress specified in the config.py).
You can run the server with :

export FLASK_APP=server.py
flask run --host=0.0.0.0

The HTTP/POST defining the pad_names we want to follow should be of the following format :

{'pad_names': ["/ac-textarea/default/0","/ac-textarea/default/0"]}

or we can send a regex

{'regex': "^/ac-textarea/default"}

Finally, by sending a HTTP/POST without a json, we will get updates on all the existing pads at the time of the POST.

The answer will be of the following format:

{<pad name>: {'Alternating score:': 0.75,
              'Break score day:': 4.286906128731559e-06,
              'Break score short:': 0.0,
              'Overall delete type score:': 0.043478260869565216,
              'Overall edit type score:': 0.6086956521739131,
              'Overall paste type score:': 0.0,
              'Overall write type score:': 0.34782608695652173,
              'Proportion score:': 0.7069812420203414,
              'Synchronous score:': 0.9533678756476682,
              'User delete score:': 3.855291626815406e-05,
              'User edit score:': 0.860781638418886,
              'User paste score:': 4.626349952178488e-05,
              'User proportion per paragraph score': 3.855291626815406e-05,
              'User write score:': 0.8015329750891311,
              'text': <document text>,
              'text_colored_by_authors': <text colored by authors>,
              'text_colored_by_ops': <text colored by ops>}

Architecture

The whole program use various files:

Visualization

Note: The implementation of the visu methods can be found in visualization.py

Show the text with the different authors using display_text_colored_by_authors, here is an example with the admin and two other authors:

Show the same text with Operations randomly colored using `display_text_colored_by_ops:

Show the overall proportion of participation of the pad using display_user_participation:

Note: We consider participations to be absolute. So if a user delete for example a line, it counts as a participation. See below for a separated visualization.

Show the same proportion as before but for each Paragraphs using display_user_participation_paragraphs:

Show the same proportion as before but with addition and deletions separated display_user_participation_paragraphs_with_del:

Show the proportion of the pad written synchronously using display_proportion_sync_in_pad:

Note: We don't take into account the admin here whereas in the next figure we do in order to be consistent with the previous figures.

Show the proportion written synchronously of each Paragraphs using display_proportion_sync_in_paragraphs:

Show the distribution of Operation different types (except Jump) in one pad using display_overall_op_type:

Show the same as above but according to authors using display_types_per_user:

Future work

There is still much to do: