bacook17 / acronym

ACRONYM (Acronym CReatiON for You and Me)
MIT License
383 stars 31 forks source link

make this an importable lib #14

Open schlitzered opened 4 years ago

schlitzered commented 4 years ago

hi, i find the tool pretty use full, and it would be nice if you could make this a lib, with a stable interface, that can be imported into other projects.

for this i would suggest that the logic to choose "corpus" should move into find_acronyms

mshemuni commented 1 year ago

I'd say it is. Just looking at the code one can see the acronym can be used as:

import nltk
from acronym.acronym import find_acronyms

ac.acronym.find_acronyms("Hello World", nltk.corpus.gutenberg, min_length=2)

Output:


Collecting word corpus
Identifying matching acronyms
Process Complete
        long_version  score
acronym
HOWL     HellO WorLd     18
HEW      HEllo World     15
HOOD     HellO wOrlD     15
HOW      HellO World     15
HELD     HEllo worLD     13
HERD     HEllo woRlD     13
HOLD     HellO worLD     13
HOD      HellO worlD     10
HOO      HellO wOrld     10
HER      HEllo woRld      8
HOR      HellO woRld      8
HO       Hello wOrld      5

see: https://github.com/bacook17/acronym/blob/584c84497f8ed8bac66e246e9b9e52b4ea86391b/acronym/acronym.py#L102

One can change corpus

  1. nltk.corpus.words
  2. nltk.corpus.brown
  3. nltk.corpus.gutenberg

Do not forget to change max and min length. In my example 5 was too long and the output was empty DataFrame.