EDiscovery refers to the management of electronically stored information in the litigations, dispute resolution proceedings, and investigations. Different machine learning techniques such as supervised classification and unsupervised clustering have been employed to reduce manual (linear human review) and increase investigative speed and efficiencies. We propose to improve on the state of the art of machine learning for EDiscovery by i) using topic modeling to provide greater power than commonly employed methods such as keyword search and Latent Semantic Analysis, ii) using identified topics for document categorization and ranking their relevance to a given query, and iii) using the topic framework to provide document summaries. Furthermore, to ensure the broad penetration of our effort, all software tools resulting from this effort will be implemented in the context of an open-source system that can serve as the basis for an open EDiscovery framework.
This section provides the general guidelines to access the Git Hub repository, coding style, enhancements, and issue tracking.
To Edit this file
See GitHub markdown online help
Enhancements and issues
git commit -a -m'issue #1 fix: see the issue details for information.'
Git
To clone the ediscovery repository use the following command
git clone https://github.com/clintpgeorge/ediscovery
See crash course on Git SVN for more details. The following are some useful git commands
git pull # to update the local from the remote
git status # to see the local repository status
git add file_name # to add a new file file_name
git commit -a -m'[commit message]' # for commit all files in your local
git push # to update your commits to the master
Python
pyLucene Installation
Ubuntu
Windows 7 and 8
Topic Modeling
Topic modeling packages can be installed from the Gensim website.
Development Environment Setup
Please follow the following steps in the order given below for setting up development environment. The executables can be found in the software folder (Coming Soon)
Delete files boot_common.py, and boot_common.pyc from C:\Python27\Lib\site-packages\py2exe. Add the boot_common.py from the software folder to the given path, compile it using the following code in python CLI
import py_compile py_compile.compile('boot_common.py')