distantreading / WG1

Discussion documents and working papers from WG1
8 stars 9 forks source link

minimum length criterion #7

Closed lb42 closed 6 years ago

lb42 commented 6 years ago

At one point the sampling document says the minimum length should be 10,000 words. Then it proposes a length scale short (less than 5000 words) medium (5000 to 20000 words) long (more than 20000 words)

So which one should it be?

Ondelli commented 6 years ago

I think it's simply a case of repeated typos: it should read 50,000 and 200,000 words, as specified elsewhere: at least 20% are short novels (10-50k word tokens), at least 20% are long novels (>200k word tokens).

lb42 commented 6 years ago

Please correct the typos ! Are we proposing a two band scale or a three band one?

CarolinOdebrecht commented 6 years ago

Thanks for raising this issue! We adjusted this criterion during the WG meeting in Prague. We decided to use following: it should read 50,000 and 200,000 words, as specified elsewhere: at least 20% are short novels (10-50k word tokens), at least 20% are long novels (>200k word tokens) I updated the sampling document accordingly.