Closed lb42 closed 6 years ago
I think it's simply a case of repeated typos: it should read 50,000 and 200,000 words, as specified elsewhere: at least 20% are short novels (10-50k word tokens), at least 20% are long novels (>200k word tokens).
Please correct the typos ! Are we proposing a two band scale or a three band one?
Thanks for raising this issue! We adjusted this criterion during the WG meeting in Prague. We decided to use following: it should read 50,000 and 200,000 words, as specified elsewhere: at least 20% are short novels (10-50k word tokens), at least 20% are long novels (>200k word tokens) I updated the sampling document accordingly.
At one point the sampling document says the minimum length should be 10,000 words. Then it proposes a length scale short (less than 5000 words) medium (5000 to 20000 words) long (more than 20000 words)
So which one should it be?