Implement D estimate - Githubissues

dpalmasan / TRUNAJOD2.0

An easy-to-use library to extract indices from texts.

https://trunajod20.readthedocs.io/en/latest/

MIT License

29 stars 7 forks source link

Implement D estimate #29

Closed dpalmasan closed 3 years ago

dpalmasan commented 3 years ago

This estimate, estimates lexical diversity using a non-linear model. it is computed by the following procedure:

Take a random sample of $N$ words from the text.
Calculate the TTR (type token ratio)
Find the $D$ value that bests fit the following equation:

$TTR=\frac{D}{N}\left[\sqrt{(1 + 2\frac{N}{D})}-1\right]$

It is allowed using numpy as spaCy is already set as a dependency for TRUNAJOD