clintpgeorge / ediscovery

Active Topic Learning based Legal eDiscovery
6 stars 1 forks source link

Research outline

EDiscovery refers to the management of electronically stored information in the litigations, dispute resolution proceedings, and investigations. Different machine learning techniques such as supervised classification and unsupervised clustering have been employed to reduce manual (linear human review) and increase investigative speed and efficiencies. We propose to improve on the state of the art of machine learning for EDiscovery by i) using topic modeling to provide greater power than commonly employed methods such as keyword search and Latent Semantic Analysis, ii) using identified topics for document categorization and ranking their relevance to a given query, and iii) using the topic framework to provide document summaries. Furthermore, to ensure the broad penetration of our effort, all software tools resulting from this effort will be implemented in the context of an open-source system that can serve as the basis for an open EDiscovery framework.

General guide lines

This section provides the general guidelines to access the Git Hub repository, coding style, enhancements, and issue tracking.

To Edit this file

See GitHub markdown online help

Enhancements and issues

Git

To clone the ediscovery repository use the following command

git clone https://github.com/clintpgeorge/ediscovery

See crash course on Git SVN for more details. The following are some useful git commands

git pull # to update the local from the remote 
git status # to see the local repository status 
git add file_name # to add a new file file_name 
git commit -a -m'[commit message]' # for commit all files in your local 
git push # to update your commits to the master 

Python

pyLucene Installation

Ubuntu

Windows 7 and 8

Topic Modeling

Topic modeling packages can be installed from the Gensim website.

Development Environment Setup

Please follow the following steps in the order given below for setting up development environment. The executables can be found in the software folder (Coming Soon)