Closed 1over137 closed 1 month ago
I prefer working towards releasing a version 1 and see from there, that includes documentating how the sources are compiled, I'm working on it.
The API is not completely stable right now as a few things are still broken after an intensive refactoring. I'd suggest you wait with your PR until things have stabilized a bit. Using the new classes to load external dictionaries seems like a good approach.
@1over137 You can start working on a PR if you want, the API for dictionary lookup strategy is stable. I also added info in the training readme on additional dictionaries.
Hi guys,
Such API is already there.
You just need to implemente the DictionaryFactory
protocol and use it to load your custom dictionaries.
@1over137 Did that solve your problem or do we need to work on the documentation?
Closing as this was answered. Feel free to reopen if there are more questions.
It would be nice if the API provided a way of loading a custom dictionary without resorting to patching the data in the module. In some languages, the lemmatizer coverage can be rather poor, and other languages are not supported at all. If this is welcome and we can agree on what the API should look like, I can implement this and make a PR. My idea would be passing a dict argument to the simplemma.lemmatize, or a global state that stores which extra dicts to use in each language and a few functions to manipulate it.