What is FASTREAD_ECL?
FASTREAD_ECL is a tool to support primary study selection in systematic literature review implemented on HPCC for massive scale-up.
Latest Versions:
Setting up FASTREAD
-
Setting up HPCC:
- Install HPCC virtual machine
- Install HPCC Client Tools
- add C:\Program Files (x86)\HPCCSystems\xx.xx\clienttools\bin (the installation path to ecl.exe) to your environment path.
-
Setting up Python:
- We use anaconda by continuum.io (see Why?)
- We won't need the entire distribution. Download a Python 2.7 version & install a minimal version of anaconda.
- Make sure you select add to PATH during install.
-
Getting dependencies:
- get flask package from anaconda: run
conda install flask
in your terminal/shell.
- get ecl-ml from github
- install libsvm development package on your HPCC nodes: ECLAGENT, THOR, THORMASTER, ECLCCSERVER.
- for the virtual box case, just run
sudo apt-get install libsvm-dev
on your vitual machine
- for the case of multiple nodes, every node should be installed with libsvm development package
- put ecl-ml in the same directory alongside FASTREAD_ECL:
Use FASTREAD_ECL
-
Running HPCC in virtual box:
- follow the instruction here
-
Get data ready:
- Prepare a csv file like this:
- the 'label' column stores the TRUE label of each entry, if not applicable, leave it as blank or 'unknown'.
- Remove the header (first row) of your data file.
- There are some example data files in FASTREAD_ECL > UI > workspace > data
- Open ECL_Watch: http://ecl_watch_ip::8010
+ Upload your data file onto HPCC landing zone (files > Landing Zones > Upload):
- Spray it:
- Select uploaded file
- Click Spray: Delimited
- Change the Target Scope to fastread::
- Hit Spray at bottom right
-
Running script:
- Navigate to FASTREAD_ECL > UI > src and run
index.py
.
- If all is well, you'll be greeted by this:
-
The Interface:
-
Load the data:
- Click Scan button to see what data files are on your HPCC system. Then select the data to work on from the selection tab.
- Wait up to minutes for the first time. Once the data is successfully loaded, you will see the following:
-
Begin reviewing studies:
- choose from Relevant, Irrelevant, or Undetermined for each study and hit Submit.
- hit Next when you want a to review more.
- statistics are displayed as Documents Coded: x/y (z), where x is the number of relevant studies retrieved, y is the number of studies reviewed, and z is the total number of candidate studies.
- when x is greater than or equal to 1, an SVM model will be trained after hitting Next.
- keep reviewing studies until you think most relevant ones have been retrieved.
-
Export csv:
- Click Export button will generate a csv file with your coding in FASTREAD_ECL > UI > workspace > coded.
-
Restart:
- Click Restart button will give you a fresh start and loose all your previous effort on the current data.
Version Logs
May 23, 2017. v1.0.0 The very first, basic version is released.