Create a class capable of determining the words present in a corpus. Must be able to isolate the top k most frequent of these words to use as a vocabulary. Must then label the rest of the words as some arbitrary token like "UNK".
Run main function in "feature_extractor.py". Results will print sample input sentences, the vocabulary extracted from the inputs, and the vector encoded versions.
Create a class capable of determining the words present in a corpus. Must be able to isolate the top k most frequent of these words to use as a vocabulary. Must then label the rest of the words as some arbitrary token like "UNK".