The utlimate responsive web based OCR-embedded plaigirism checker which generates PDF plaigirism reports and supports handwritten documents. Created as an end-of-semester project.
Create the file parser to add the file parsing utility which processes data from allowed file formats, namely being, .DOCX and .TXT, and converts them into a readable string which is further tokenized and converted into an iterable string list of phrases for input, split on ending punctuation, to be further serviced into the webscrapper & plagiarism checker module. Check possibility for multi-lingual support.
Here's a check-list to ensure correct implementation of features:
[x] Add file parser for parsing file content to readable string.
[x] Add converter which tokenizes and converts the input string to an iterable string list of phrases split on punctuation.
[x] Check for possibility of multi-lingual support.
Perform tests and submit a stable build for the sub-module, as a python module stored within the /utils/ folder for further usage, before the the sub-module completion phase deadline.
File Parser & Tokenizer Code
Create the file parser to add the file parsing utility which processes data from allowed file formats, namely being, .DOCX and .TXT, and converts them into a readable string which is further tokenized and converted into an iterable string list of phrases for input, split on ending punctuation, to be further serviced into the webscrapper & plagiarism checker module. Check possibility for multi-lingual support.
Here's a check-list to ensure correct implementation of features:
Perform tests and submit a stable build for the sub-module, as a python module stored within the /utils/ folder for further usage, before the the sub-module completion phase deadline.