Wazzabeee / copy-spotter

Make plagiarism detection easier. This script will find similar sentences between given files and highlight them in a side by side comparison.
MIT License
45 stars 13 forks source link
beautifulsoup bs4 docx odt pdf plagiarism plagiarism-check plagiarism-checker plagiarism-detection plagiarism-detector python side-by-sidediff similarity similarity-detection similarity-score txt

Copy Spotter

PyPI - Version PyPI - License Python

GIF demo


This program will process pdf, txt, docx, and odt files that can be found in the given input directory, find similar sentences, calculate similarity percentage, display a similarity table with links to side by side comparison where similar sentences are highlighted.


$ pip install copy-spotter
$ copy-spotter [-s] [-o] [-h] input_directory

Positional Arguments:

├── file_1.docx
├── file_2.pdf
└── file_3.pdf

Optional Arguments:


# Analyze documents in 'data/pdf/plagiarism', with default settings
$ copy-spotter data/pdf/plagiarism

# Analyze with custom block size and specify output directory
$ copy-spotter data/pdf/plagiarism -s 5 -o results/output

Development Setup:

# Clone this repository
$ git clone https://github.com/Wazzabeee/copy_spotter

# Go into the repository
$ cd copy_spotter

# Install requirements
$ pip install -r requirements.txt
$ pip install -r requirements_lint.txt

# Install precommit
$ pip install pre-commit
$ pre-commit install

# Run tests
$ pip install pytest
$ pytest tests/

# Run package locally
$ python -m scripts.main [-s] [-o] [-h] input_directory
