ai-se/FASTREAD_ECL - Githubissues

What is FASTREAD_ECL?

FASTREAD_ECL is a tool to support primary study selection in systematic literature review implemented on HPCC for massive scale-up.

Latest Versions:

On Github repo: https://github.com/ai-se/FASTREAD_ECL.

Setting up FASTREAD

Setting up HPCC:
- Install HPCC virtual machine
- Install HPCC Client Tools - add C:\Program Files (x86)\HPCCSystems\xx.xx\clienttools\bin (the installation path to ecl.exe) to your environment path.
Setting up Python:
- We use anaconda by continuum.io (see Why?)
- We won't need the entire distribution. Download a Python 2.7 version & install a minimal version of anaconda.
- Make sure you select add to PATH during install.
Getting dependencies:
- get flask package from anaconda: run conda install flask in your terminal/shell.
- get ecl-ml from github
- install libsvm development package on your HPCC nodes: ECLAGENT, THOR, THORMASTER, ECLCCSERVER.
- for the virtual box case, just run sudo apt-get install libsvm-dev on your vitual machine
- for the case of multiple nodes, every node should be installed with libsvm development package
- put ecl-ml in the same directory alongside FASTREAD_ECL:

Use FASTREAD_ECL

Running HPCC in virtual box:
- follow the instruction here
Get data ready:
- Prepare a csv file like this:
- the 'label' column stores the TRUE label of each entry, if not applicable, leave it as blank or 'unknown'.
- Remove the header (first row) of your data file.
- There are some example data files in FASTREAD_ECL > UI > workspace > data
- Open ECL_Watch: http://ecl_watch_ip::8010 + Upload your data file onto HPCC landing zone (files > Landing Zones > Upload):
- Spray it:
- Select uploaded file
- Click Spray: Delimited
- Change the Target Scope to fastread::
- Hit Spray at bottom right
Running script:
- Navigate to FASTREAD_ECL > UI > src and run index.py.
- If all is well, you'll be greeted by this:
The Interface:
- Fire up your browser and go to http://localhost:5000/hello/. You'll see a page like below:
Load the data:
- Click Scan button to see what data files are on your HPCC system. Then select the data to work on from the selection tab.
- Wait up to minutes for the first time. Once the data is successfully loaded, you will see the following:
Begin reviewing studies:
- choose from Relevant, Irrelevant, or Undetermined for each study and hit Submit.
- hit Next when you want a to review more.
- statistics are displayed as Documents Coded: x/y (z), where x is the number of relevant studies retrieved, y is the number of studies reviewed, and z is the total number of candidate studies.
- when x is greater than or equal to 1, an SVM model will be trained after hitting Next.
- keep reviewing studies until you think most relevant ones have been retrieved.
Export csv:
- Click Export button will generate a csv file with your coding in FASTREAD_ECL > UI > workspace > coded.
Restart:
- Click Restart button will give you a fresh start and loose all your previous effort on the current data.

Version Logs

May 23, 2017. v1.0.0 The very first, basic version is released.

ai-se / FASTREAD_ECL

readme

What is FASTREAD_ECL?

Setting up FASTREAD

Use FASTREAD_ECL

Version Logs