Pre-calculated corpus size lists required

gkunter / coquery

Coquery is a free corpus query tool for linguists, lexicographers, translators, and anybody who wishes to search and analyse a text corpus.

GNU General Public License v3.0

18 stars 4 forks source link

Originally reported by: gkunter (Bitbucket: gkunter, GitHub: gkunter)

ISSUE: Detecting the size of sub-corpora (e.g. a sub-corpus that contains only sources from one genre) can be very slow if the corpus is big due to the COUNT(*) clause. This is a problem if we want to express relative frequencies (words per million).

SOLUTION: During corpus creation, produce a data table that stores the corpus size for all combinations of source features. This table can be used as a lookup instead of a SQL query.

Bitbucket: https://bitbucket.org/gkunter/coquery/issue/20

gkunter / coquery

Pre-calculated corpus size lists required #20