dpalmasan / TRUNAJOD2.0

An easy-to-use library to extract indices from texts.
https://trunajod20.readthedocs.io/en/latest/
MIT License
29 stars 7 forks source link

Implement D estimate #29

Closed dpalmasan closed 3 years ago

dpalmasan commented 3 years ago

This estimate, estimates lexical diversity using a non-linear model. it is computed by the following procedure:

  1. Take a random sample of words from the text.
  2. Calculate the TTR (type token ratio)
  3. Find the value that bests fit the following equation:

It is allowed using numpy as spaCy is already set as a dependency for TRUNAJOD