Closed JamesDoonan1 closed 3 weeks ago
References:
Shannon, C. E. (1948). A Mathematical Theory of Communication. Bell System Technical Journal.
This foundational paper introduced the concept of n-gram models in communication theory, which is the basis for trigram models used in language modeling. Available at: Link Project Gutenberg. (n.d.). Free eBooks.
Source of the five texts used to build the trigram model. Project Gutenberg provides a large collection of public domain books, making it a popular choice for text mining and natural language processing tasks. Available at: https://www.gutenberg.org/ Natural Language Toolkit (NLTK). (n.d.). NLTK: Natural Language Processing with Python.
A comprehensive library for natural language processing in Python, often used for tasks such as text cleaning, tokenization, and working with n-grams. Though not used directly in this project, NLTK provides valuable insights for implementing language models. Available at: https://www.nltk.org/ Python Documentation. (n.d.). The Python Standard Library.
Official Python documentation for tools like defaultdict used in this project for counting trigrams efficiently. Available at: https://docs.python.org/3/library/collections.html#collections.defaultdict Markov Chains for Language Modeling. (n.d.). Towards Data Science: An introduction to Markov chains in NLP.
Explains how Markov chains and n-gram models are used for language generation and prediction, which is directly related to the concept of trigram models. Available at: Markov Chains in NLP SpaCy Documentation. (n.d.). SpaCy: Industrial-strength Natural Language Processing.
Although external libraries were not used, SpaCy's text processing techniques provide inspiration for how to structure and clean data in NLP tasks. Available at: https://spacy.io/
Update references for task 1, either add new file or out into README