LIAAD / yake

Single-document unsupervised keyword extraction
https://liaad.github.io/yake
Other
1.64k stars 227 forks source link

Configurable term length and terms containing numbers #68

Closed tdxius closed 2 years ago

tdxius commented 2 years ago

In general, I'm very happy with the keyword extractions I get from this library, but I faced a few challenges:

Would it be possible to make it a CLI option? Something like the example below?

Usage: yake [OPTIONS]

Options:
    -mt, --min-term-length INTEGER          Minimum length of a term that is extracted
        -tn, --term-contains-numbers               Extracted terms can contain numbers
rncampos commented 2 years ago

Thanks for your interest in playing with yake! Also thanks for your contribution in pointing out this. Actually, we have a tagsToDiscard parameter on the DataCore class (datapresentation.py file) which is set to discard "d" (digits or numbers). This is done by default. We should have given this option to the users at first instance upon calling the public method - yake.KeywordExtractor - (but we haven't done so). We are looking for students interested in improving some limitations of yake. In the meantime, you can fork the code and adapt it for your own purposes (including making that option available on cli).

tdxius commented 2 years ago

Thank you for the quick response and the explanation. Are you willing to include this in case I create PR?

rncampos commented 2 years ago

You welcome. Sure. We will include. It will also be a good idea to have that as a parameter when calling the function (beyond having it also on the cli). Best