czcorpus / cnc-masm

Manatee Assets, Services and Metadata - a complementary service for KonText
Other
2 stars 1 forks source link

New function - generate random subcorpus of defined size #54

Open tomachalek opened 1 year ago

tomachalek commented 1 year ago

This is related to CNC's preflight subcopora and it would allow for better random samples as compared to the current solution where we just take first N tokens of a corpus.