Comp220 @ Capilano U - Map ADT class collaboration project (2019)
This project has 4 "deliverable" C modules:
A Map ADT provides the following operations:
Insert(key, value)
HasKey(key)
, otherwise inserts a new value in MapRemove(key)
Get(key)
HasKey(key)
Size()
Clear()
KeySet()
The public API should also include a "constructor" and a "destructor" function, used by the client to create a new Map and free the Map when done.
The HashMap
and TreeMap
ADTs must provide exactly the same public API.
A "Word Cloud" provides a visual representation of the frequency with which each word occurs in a piece of text. The first step in creating a Word Cloud is to perform a frequency analysis - parse the text and count the number of times each word occurs.
Our small, command-line application will perform this frequency analysis on text read from file. The name of the file is provided as a command line argument:
> wordcloud myTextFile.txt
The input file may be any plain text file with a maximum word size of 127 characters.
"Words" are any token speparated by whitespace.
fscanf(infile, " %127s", &word)
can be used to read one word, at most 127 characters,
into a buffer (sized at least char word[128]
).
However, parsing words while ignoring punctuation could be a significant challenge,
so this functionality should be built in phases.
The program counts the frequency of each word in the input file and prints the frequency table to the console (stdout). Something like:
Word | Frequency |
---|---|
a | 23 |
the | 13 |
and | 9 |
you | 5 |
... | ... |
Notice as words are read from the file, we will need to find them efficiently so we can add to their frequency count. If we imagine an "entry" in a search tree having a char* key and an int count, then we can Store the words in a search tree. This gives us a fast, simple way to find a word, and update its counter. (Note: we may want to add a specialized function to abstract the operation of searching for word and adding one to its counter -- note though that this is part of the application code, not really part of the Map ADT.)
In addition, notice that the frequency table shown above is printed in descending order, by frequency. This makes sense for creating a word cloud, where we are most interested in high-frequency words. Two soltuions that come to mind for achieving this: