# FROG-analytics
Analytics algorithms and visualizations for collaborative writings. It currently handle three collaborative document editors, Etherpad, Collab-react-components and FROG
You will need Python3 and the following dependencies.
pip install csv
pip install ast
pip install numpy
pip install argparse
pip install pymongo
pip install matplotlib
pip install pandas
pip install seaborn
pip install sqlite3
pip install flask
You'll need to download etherpad from http://etherpad.org/. Extract it in a folder named etherpad.
You can start etherpad by runnning start.bat
.
Then connect to http://localhost:9001.
Note: If you want to join the pad on another terminal, replace localhost by the IP address of the host. Users collaborating need to be on the same network or remotely connect to the host.
Start Editing.
All edits are stored in etherpad/var/dirty.db
.
(changeset format: http://policypad.readthedocs.io/en/latest/changesets.html)
In order to use the text collab editor written by Dario Anongba. First install pymango by using:
pip install pymongo
Clone the program from Dario: Collab-React-Components in any folder (doesn't have to be in the same folder as the repo).
Install node.js/npm
Install react with npm (version 15.4.0+) in Dario's folder
git clone https://github.com/chili-epfl/collab-react-components
npm install --save react
npm install --save react-dom
npm init
and npm install
in root folder and in the demos folder
npm init
npm install
cd demos/collab-editor
npmn init
npm install
Replace server.js
with the corresponding file in this repo so that the edits are stored in the mongo database. Then start the program with npm start
service mongod start
cd demos/collab-editor
npm start
To access the edits, the program will use pymongo.
To delete the logs, you need to run the python script drop mongo database.py
python "drop mongo dabase.py"
Find a detailed tutorial on MongoDB (op format https://github.com/ottypes/text)
In its current state, the program groups the fine grained writing events (ElementaryOperation
) provided by the editors into Operations
(list of writing events by the same author in a short time lapse and at the same position). Then Paragraphs
are deduced which are the collection of operations in the same line. This allows us to compute metrics from the context of an operation:
We have provided a small command line interface to interact with the program.
python analytics.py
With the following arguments:
usage: analytics.py [-h] [-p PATH_TO_DB] [-e {etherpad,stian_logs,collab-react-components}] [-t] [-viz] [-v] [subset_of_pads SUBSET_OF_PADS | --specific_pad SPECIFIC_PAD]
Run the analytics.
optional arguments:
-h, --help show this help message and exit
-p PATH_TO_DB, --path_to_db PATH_TO_DB
path to database storing the edits
-e, --editor {etherpad,stian_logs,collab-react-components}
What editor are the logs from
-t, --texts Print the texts colored by ops and by author
-viz, --visualization
Display the visualization (proportion of participation
of the users per pad/paragraph, how synchronous they
are...
-v, --verbosity increase output verbosity (you can put -v or -vv)
--subset_of_pads SUBSET_OF_PADS
Size of the subset of pads we will process
--specific_pad SPECIFIC_PAD
Process only one pad
Below are a few examples of execution.
To launch the program on data collected with the Etherpad editor. You can launch the following command:
python analytics.py -p "etherpad/dirty.db" -e etherpad -t -viz -v --specific_pad "First Pad"
This will extract the logs in etherpad/dirty.db
and provide a few insights on the pad "First Pad". You could also remove --specific_pad "First Pad"
so that you display the insights for all pads in the logs. The -t
argument will print the text colored by authors and by operations. The -viz
argument will display the various visualizations explained later.
Note : it is possible that your terminal doesn't handle colors. In this case, you will need to run it in a python console or remove the -t argument.
If you would like to run the program on all the pads of the Collab-React-Components editor, then you can run the following command
python analytics.py -e collab-react-components -t -viz -v
Note: there is no need to specify the path of the logs since they are store in the mongo database.
Finally if you would like to run the program on a subset of the pads from the logs of Stian, you can run the following command (We have a different editor since the logs are not stored in the exact same way as etherpad stores them).
python analytics.py -p "stian logs/store.csv" -e stian_logs -t -viz -v --subset_of_pads 10
This will display the visualizations and colored texts for the first 10 pads.
Live analytics updates the metrics, as soon as there is a change in the document. You can run it with:
python live_analytics.py
Note: live_analytics.py doesn't take any command line arguments. We suggest you configure which editor you are using in config.py !
Live analytics for FROG receives by HTTP/POST in a json the pad_names it wants to follow. It can also send a regex. Then every few seconds (the refresh rate is configurable in the config.py file), the program sends a HTTP/POST to a listening server (adress specified in the config.py).
You can run the server with :
export FLASK_APP=server.py
flask run --host=0.0.0.0
The HTTP/POST defining the pad_names we want to follow should be of the following format :
{'pad_names': ["/ac-textarea/default/0","/ac-textarea/default/0"]}
or we can send a regex
{'regex': "^/ac-textarea/default"}
Finally, by sending a HTTP/POST without a json, we will get updates on all the existing pads at the time of the POST.
The answer will be of the following format:
{<pad name>: {'Alternating score:': 0.75,
'Break score day:': 4.286906128731559e-06,
'Break score short:': 0.0,
'Overall delete type score:': 0.043478260869565216,
'Overall edit type score:': 0.6086956521739131,
'Overall paste type score:': 0.0,
'Overall write type score:': 0.34782608695652173,
'Proportion score:': 0.7069812420203414,
'Synchronous score:': 0.9533678756476682,
'User delete score:': 3.855291626815406e-05,
'User edit score:': 0.860781638418886,
'User paste score:': 4.626349952178488e-05,
'User proportion per paragraph score': 3.855291626815406e-05,
'User write score:': 0.8015329750891311,
'text': <document text>,
'text_colored_by_authors': <text colored by authors>,
'text_colored_by_ops': <text colored by ops>}
The whole program use various files:
Note: The implementation of the visu methods can be found in
visualization.py
Show the text with the different authors using display_text_colored_by_authors
, here is an example with the admin and two other authors:
Show the same text with Operations
randomly colored using `display_text_colored_by_ops:
Show the overall proportion of participation of the pad using display_user_participation
:
Note: We consider participations to be absolute. So if a user delete for example a line, it counts as a participation. See below for a separated visualization.
Show the same proportion as before but for each Paragraphs
using display_user_participation_paragraphs
:
Show the same proportion as before but with addition and deletions separated display_user_participation_paragraphs_with_del
:
Show the proportion of the pad written synchronously using display_proportion_sync_in_pad
:
Note: We don't take into account the admin here whereas in the next figure we do in order to be consistent with the previous figures.
Show the proportion written synchronously of each Paragraphs
using display_proportion_sync_in_paragraphs
:
Show the distribution of Operation
different types (except Jump) in one pad using display_overall_op_type
:
Show the same as above but according to authors using display_types_per_user
:
There is still much to do: