Implement Guiraud’s Index

dpalmasan / TRUNAJOD2.0

An easy-to-use library to extract indices from texts.

https://trunajod20.readthedocs.io/en/latest/

MIT License

29 stars 7 forks source link

Implement Guiraud’s Index #26

Open dpalmasan opened 3 years ago

dpalmasan commented 3 years ago

This is a lexical diversity measurement that penalizes number of words. It is computed as:

$GI=\displaystyle\frac{v}{\sqrt{N}}$

Where $v$ is the number of distinct words in the text, and $N$ is the total number of words in the text.

Unit tests should be added as well.

Docs should be updated as well, adding the following reference:

@misc{herdan1961problemes,
  title={Probl{\`e}mes et m{\'e}thodes de la statistique linguistique},
  author={Herdan, Gustav},
  year={1961},
  publisher={JSTOR}
}

supersonic1999 commented 3 years ago

Looking into this now; would you suggest adding this as a function to a pre-existing file or creating a separate one?

dpalmasan commented 3 years ago

Hello! Sure, this should go into src/ttr.py file, as it is a TTR metric. Let me know if you have more questions!

supersonic1999 commented 3 years ago

Thanks! I just pushed the code to #49 .

Im going to have to do some more research on github, getting lost if ive pushed that right... cheers!