Text Feature Extractor - Githubissues

cse442-fall-2019-offering / 442projects-team-rydat

442projects-team-rydat created by GitHub Classroom

1 stars 0 forks source link

Text Feature Extractor #39

Open dakota0064 opened 5 years ago

dakota0064 commented 5 years ago

Create a class capable of determining the words present in a corpus. Must be able to isolate the top k most frequent of these words to use as a vocabulary. Must then label the rest of the words as some arbitrary token like "UNK".

dakota0064 commented 5 years ago

Task Test:

Run main function in "feature_extractor.py". Results will print sample input sentences, the vocabulary extracted from the inputs, and the vector encoded versions.