cse442-fall-2019-offering / 442projects-team-rydat

442projects-team-rydat created by GitHub Classroom
1 stars 0 forks source link

Text Feature Extractor #39

Open dakota0064 opened 5 years ago

dakota0064 commented 5 years ago

Create a class capable of determining the words present in a corpus. Must be able to isolate the top k most frequent of these words to use as a vocabulary. Must then label the rest of the words as some arbitrary token like "UNK".

dakota0064 commented 5 years ago

Task Test: