dpalmasan / TRUNAJOD2.0

An easy-to-use library to extract indices from texts.
https://trunajod20.readthedocs.io/en/latest/
MIT License
29 stars 7 forks source link

Implement Guiraud’s Index #26

Open dpalmasan opened 3 years ago

dpalmasan commented 3 years ago

This is a lexical diversity measurement that penalizes number of words. It is computed as:

Where is the number of distinct words in the text, and is the total number of words in the text.

Unit tests should be added as well.

Docs should be updated as well, adding the following reference:

@misc{herdan1961problemes,
  title={Probl{\`e}mes et m{\'e}thodes de la statistique linguistique},
  author={Herdan, Gustav},
  year={1961},
  publisher={JSTOR}
}
supersonic1999 commented 3 years ago

Looking into this now; would you suggest adding this as a function to a pre-existing file or creating a separate one?

dpalmasan commented 3 years ago

Hello! Sure, this should go into src/ttr.py file, as it is a TTR metric. Let me know if you have more questions!

supersonic1999 commented 3 years ago

Thanks! I just pushed the code to #49 .

Im going to have to do some more research on github, getting lost if ive pushed that right... cheers!