KorAP / Kalamar

:octopus: Mojolicious-based Frontend for KorAP
BSD 2-Clause "Simplified" License
7 stars 2 forks source link

Add "Export token frequency list" function to corpus statistics #163

Open kupietz opened 2 years ago

kupietz commented 2 years ago

Token (unigram) frequency lists are essential for comparing corpora and for deriving most typical (key) words.

The frequency list might need to be truncated at some minimum frequency (for license reasons), but should probably contain the total token count in the file name or in a comment. The lists should contain tab separated values ordered by decreasing frequency

export_fl