File Parser & Tokenizer Code

Create the file parser to add the file parsing utility which processes data from allowed file formats, namely being, .DOCX and .TXT, and converts them into a readable string which is further tokenized and converted into an iterable string list of phrases for input, split on ending punctuation, to be further serviced into the webscrapper & plagiarism checker module. Check possibility for multi-lingual support.

Here's a check-list to ensure correct implementation of features:

[x] Add file parser for parsing file content to readable string.
[x] Add converter which tokenizes and converts the input string to an iterable string list of phrases split on punctuation.
[x] Check for possibility of multi-lingual support.

Perform tests and submit a stable build for the sub-module, as a python module stored within the /utils/ folder for further usage, before the the sub-module completion phase deadline.

eehab-saadat / PlagPatrol

File Parser & Tokenizer Code #1

File Parser & Tokenizer Code