Cathal-McHale / emerging-technologies

0 stars 0 forks source link

Create a Trigram Model from Five Project Gutenberg Texts #1

Open Cathal-McHale opened 1 month ago

Cathal-McHale commented 1 month ago

Task Overview

The objective is to create a trigram model from five free English works in Plain Text UTF8 format from Project Gutenberg. The process involves:

  1. Select Five free books from Project Gutenberg
  2. PreProcess Text
    • Remove preamble and postamble sections
    • Keep only ASCII letters, full stops and spaces
    • Convert all letters to uppercase.
  3. Create Trigram model
    • Count the occurrences of each sequence of three characters in the previously processed text

SubTasks

[ x] Select 5 tests from Proj. GutenBerg [ x] Download the texts in UTF8 plain text [ x] Preprocess the texts: remove unwanted char. [ x] convert all to uppercase [ x]Create a func. to generate trigrams from the processed text

Cathal-McHale commented 1 month ago

task 1 Complete