A utility to check if a document's contents are plagiarised.
The project uses python-docx module to decode docx files. The python-docx module has its own set of dependent libraries. The required libraries are:
sudo apt-get install python-setuptools
sudo easy_install pip
sudo pip install PIL
sudo pip install lxml
sudo pip install python-dateutil
sudo pip install docx
sudo apt-get install poppler-utils
sudo apt-get install catdoc
These steps assume you already have python installed and that python is in your windows environment variables.
Download setup-tools according to your python version. (That is python 2.7 in most cases)
Run the .exe file. The installer will automatically find your python installation location from the registry and install easy_install to the Scripts directory where your python installation is located.
Once the installer has run, add easy_install to the windows environment variables path.
easy_install pip
pip install PIL
pip install lxml
pip install python-dateutil
pip install docx
Holds Twitter Bootstrap CSS and Javascript files and images/glyphicons
Stores configuration data (Path to Python on Windows)
Contains python scripts to perform plagiarism checks
Contains uploaded files
Backend is supported using python. There are 3 scripts in total.
Main script which gets the results of plagiarism
Used to strip text from HTML tags
Helper modules to find cosine similarity between strings
python main.py sampleText.txt sampleOut.txt