bjascob / pyInflect

A python module for word inflections designed for use with spaCy.
MIT License
91 stars 5 forks source link

Allow to use custom infl.csv and overrides.csv files #3

Closed D-Pavlov closed 4 years ago

D-Pavlov commented 4 years ago

It would be great to be able to extend the infl.csv and overrides.csv files, e.g. a word is missing, so it can just be appended it to the file. So one way this could be possible, if I could specify paths to those files and keep them somewhere accessible and not untouchable in the library folder. Currently, it's not exactly supported. When I tried to initialize manually the Inflections object, I got an error. But maybe I'm just doing something wrong.

Code:

import spacy
from pyinflect import Inflections

nlp = spacy.load('en_core_web_sm')
inflections = Inflections('path/to/infl.csv', overrides_fn='path/to/overrides.csv')

Error:

ValueError: [E090] Extension 'inflect' already exists on Token. To overwrite the existing extension, set `force=True` on `Token.set_extension`.
bjascob commented 4 years ago

The problem you're running into is that for simplicity, Inflections() is instantiated in the library's init and when it's instantiated it add itself to spacy as an extension. This can only be done once, so when you try to instantiate it a second time, you get an error. You can simply replace the following line from pyinflect/__init__.py INFLECTION_INST = Inflections(INFL_FN, OVERRIDES_FN) with the filenames you want.

Notice that INFLECTION_INST is used in the calls below that line . For a more permanent change I might replace the code with something like the following...

INFLECTION_INST = None

def InflectionEngine(infl_fn=INFL_FN, overrides_fn=OVERRIDES_FN):
   global INFLECTION_INST
   if INFLECTION_INST is None:
      INFLECTION_INST = Inflections(infl_fn, overrides_fn)
    return INFLECTION_INST

def getAllInflections(lemma, pos_type=None):
    return  InflectionEngine().getAllInflections(lemma, pos_type)

def getAllInflectionsOOV(lemma, pos_type):
    return  InflectionEngine().getAllInflectionsOOV(lemma, pos_type)

def getInflection(lemma, tag, inflect_oov=False):
    return  InflectionEngine().getInflection(lemma, tag, inflect_oov)

This gives you the correct singleton behavior but if you make a call to InflectionEngine() with the appropriate file names, before you make any other calls, it'll load the files you're looking for instead of the default ones.

Alternately, in pyinflect/Inflections.py add force=True to the line that sets the extension which is... spacy.tokens.Token.set_extension(

bjascob commented 4 years ago

Added force=True in commit 270983d5da956429015117784ebf8ecced584a36 to allow for re-definition of spacy extensions.