Closed andmek closed 4 years ago
Right now we do not have a top level Python API, where you can get the analyses with a straightforward function call. Although, this is a TODO and it will be added.
Until then you can use the print_analyses script to get the analyzer output (see how it's done in turkish-morphology/analyzer/evaluator/evaluate.py).
Using the print_analyses script from within Python seems to be quite slow (around one word-form per second). Is there a faster way of analyzing word-forms?
Note: I'm using a very old computer (Intel Core 2 Duo, 8 GB of RAM, running on Lubuntu 19.10) to do something like "subprocess.check_output(".../bazel-bin/scripts/print_analyses --word=kestiriyorduk").
Yes, running the script would be slow if you are using it to analyze words in bulk. It is intended to be used for causal one-off analysis.
Calling print_analyses with subprocess.check_output would especially be slow, since print_analyses script would read and load the FAR which contains the morphological analyzer FST for each word (not even mentioning the overhead for starting a new process). You can try to modify the script in a way to accept more than one input word, and output analyses for each in bulk. But that would just be hack, not sure whether it would be a convenient solution for your use case.
In any case, please subscribe to this issue. We will soon push a native Python API to this repo that will have functions to run the analyzer from Python source.
There is now a Python API (surface_form() function of //lib:analyze.py), which you can use to run the analyzer over Turkish words within Python code.
Please see //scripts/print_analyses.py for an example use case. _evaluate() function of //scripts/evaluate_analyzer.py also has a use case with parallelization over multiple CPUs.
We are planning to expand the API and also to make this project available over PyPi. Therefore, I'm not closing this issue for now.
How can I use the analyzer within the python code, something like print(analyze('geldiğinde'))