The objective is to create a trigram model from five free English works in Plain Text UTF8 format from Project Gutenberg. The process involves:
Select Five free books from Project Gutenberg
PreProcess Text
Remove preamble and postamble sections
Keep only ASCII letters, full stops and spaces
Convert all letters to uppercase.
Create Trigram model
Count the occurrences of each sequence of three characters in the previously processed text
SubTasks
[ x] Select 5 tests from Proj. GutenBerg
[ x] Download the texts in UTF8 plain text
[ x] Preprocess the texts: remove unwanted char.
[ x] convert all to uppercase
[ x]Create a func. to generate trigrams from the processed text
Task Overview
The objective is to create a trigram model from five free English works in Plain Text UTF8 format from Project Gutenberg. The process involves:
SubTasks
[ x] Select 5 tests from Proj. GutenBerg [ x] Download the texts in UTF8 plain text [ x] Preprocess the texts: remove unwanted char. [ x] convert all to uppercase [ x]Create a func. to generate trigrams from the processed text