architshukla / Plagiarism-Checker

A utility to check if a document's contents are plagiarised
GNU General Public License v3.0
256 stars 102 forks source link
docx pdf php plagiarism plagiarism-checker ppt python

Plagiarism-Checker

A utility to check if a document's contents are plagiarised.

How it works

Required Libraries

The project uses python-docx module to decode docx files. The python-docx module has its own set of dependent libraries. The required libraries are:

GETTING LIBRARIES ON LINUX

sudo apt-get install python-setuptools
sudo easy_install pip
sudo pip install PIL

sudo pip install lxml

sudo pip install python-dateutil
sudo pip install docx
sudo apt-get install poppler-utils
sudo apt-get install catdoc

GETTING LIBRARIES ON WINDOWS

These steps assume you already have python installed and that python is in your windows environment variables.

Download setup-tools according to your python version. (That is python 2.7 in most cases)

Run the .exe file. The installer will automatically find your python installation location from the registry and install easy_install to the Scripts directory where your python installation is located.

Once the installer has run, add easy_install to the windows environment variables path.

easy_install pip
pip install PIL

pip install lxml

pip install python-dateutil

pip install docx

Folder Structure

Holds Twitter Bootstrap CSS and Javascript files and images/glyphicons

Stores configuration data (Path to Python on Windows)

Contains python scripts to perform plagiarism checks

Contains uploaded files

Python Scripts

Backend is supported using python. There are 3 scripts in total.

Main script which gets the results of plagiarism

Used to strip text from HTML tags

Helper modules to find cosine similarity between strings

Usage of Python Script (Standalone)

python main.py sampleText.txt sampleOut.txt