Token (unigram) frequency lists are essential for comparing corpora and for deriving most typical (key) words.
The frequency list might need to be truncated at some minimum frequency (for license reasons), but should probably contain the total token count in the file name or in a comment. The lists should contain tab separated values ordered by decreasing frequency
Token (unigram) frequency lists are essential for comparing corpora and for deriving most typical (key) words.
The frequency list might need to be truncated at some minimum frequency (for license reasons), but should probably contain the total token count in the file name or in a comment. The lists should contain tab separated values ordered by decreasing frequency