Closed TorstenK91 closed 3 years ago
In particular for small texts, the language detection might be off which could be a downer for people trying out the API with some trivial test texts.
How about making the language parameter mandatory and allowing to pass "auto" (or some AUTO constant). We could also consider having a Languages
class/enum/constants with e.g. German, English and Auto so people do not have to wonder which values are acceptable as languages.
The default values for the language depends on the use case and should be defined in the pipeline, as they are now. Providing a default value on client level overrides the defined functionality of a pipeline. The functionality of the python api should be consistent to the usage of the other api calls, e.g., swagger. It should be possible to provide the language of the document (if it is known), but there should be no defautl value on client side.
@TorstenK91 do you want to provide a patch and open a pull request?
Yes, I'd like to do the fix myself.
Just to be clear, If we want to mimic the swagger behavior, we neither want the language to be a mandatory parameter (arg), nor do we want to send a default value as the request parameter. So I would propose that we keep it as a named argument (kwarg) in the function, with a default value "auto" (or None), which leads to the request parameter being omitted in the request call.
Regarding a list of accepted values (Languages class/enum/constants), we would need to deal with the case where a pipeline accepts more / different / less languages. So the list (in theory) depends on the pipeline, right? So unless we can retrieve the list from the pipeline, we shouldn't force a fixed list of languages (or auto)?
The reason I suggested to make it an arg (not kwarg) parameter was to raise awareness of the existence of the language parameter to reduce the risk that people just get bad results because they do not know that they can set a language. It could still remain unset.
The reason for the Python side language list is similar - to facilitate them selecting a language if they care without having to worry if they need to specify "en" or "English" or "en-US" or "eng" or whatever.
I am re-opening the issue, because we forgot to remove the default language = "de"
in the _analyse_text_xmi
endpoint (https://github.com/averbis/averbis-python-api/blob/main/averbis/core/_rest_client.py#L1284). I will create a merge request - it is literally just a line of code that needs to be changed.
Ready to be reviewed (https://github.com/averbis/averbis-python-api/pull/76)
Describe the bug When calling analyze_Text(..) without the named parameter "language", I expect that the automatic language detection is being activated. Instead, a default value "de" is set, which assigns german as the language
To Reproduce Steps to reproduce the behavior:
Example:
Expected behavior If no language parameter is set, the api parameter should be left out (and not set to a default parameter).
Please complete the following information: