parser.py doesn't handle "esse" verbs

blagae / whitakers_words

Other

23 stars 5 forks source link

Hi guys,

Thanks for writing this great library. I'm interested to lend a hand by helping to implement support for "esse" verbs in parser.py.

Right now these are being parsed as UniqueLexeme types. For example, parser.parse('sum').get_analyses()[0] will return the following:

{'lexeme': {'id': 0, 'category': [], 'roots': [], 'senses': ['to be, exist', 'also used to form verb perfect passive tenses with NOM PERF PPL'], 'wordType': <WordType.V: 'Verb'>}, 'root': '', 'inflections': [{'wordType': <WordType.V: 'Verb'>, 'stem': 'sum', 'affix': '', 'features': {'Tense': <Tense.PRES: 'Praesens'>, 'Voice': <Voice.ACTIVE: 'Active'>, 'Mood': <Mood.IND: 'Indicative'>, 'Person': <Person.1: 1>, 'Number': <Number.S: 'Singular'>}}], 'enclitic': None}

And type(parser.parse('sum').get_analyses()[0].lexeme) will return UniqueLexeme.

However I also notice there is an esse.py file in the data directory that isn't being referenced elsewhere. Was there a plan to implement this into parser.py?

Hi,

The esse.py file is not used in the application itself, but it is loaded by the datagenerator script into the uniques dictionary, which makes it available for the default DataLayer. It is in the data folder for that exact purpose.

You can find the code at https://github.com/blagae/whitakers_words/blob/master/whitakers_words/datagenerator.py#L170 . After it has run (with pip install), you can look at the file generated/uniques.py and find the conjugated forms of esse listed there. There is also a test in Test_Uniques.py which checks that the forms are correctly parsed (as UniqueLexeme, like you mentioned).

I have the long-term ambition to allow more DataLayer implementations, most likely a few database integrations, to improve startup time, but it will take some work and I don't have the bandwidth for it right now.

Thanks for your question !

blagae / whitakers_words

parser.py doesn't handle "esse" verbs #4