analyzeText default parameter language="de" prevents automatic language detection

TorstenK91 commented 3 years ago

Describe the bug When calling analyze_Text(..) without the named parameter "language", I expect that the automatic language detection is being activated. Instead, a default value "de" is set, which assigns german as the language

To Reproduce Steps to reproduce the behavior:

Choose any pipeline that has a LanguageDetection annotator
Call analyzeText on an english text without the language parameter being set
Check the language in DocumentAnnotation

Example:

englishText="""
This text is definitely written in english and should be recognized as such.
"""
annotationTypes = ["de.averbis.types.health.DocumentAnnotation"]

annotations = pipeline.analyse_text(englishText.encode('utf-8'),annotation_types=annotationTypes)
for annot in annotations:
    if "DocumentAnnotation" in annot["type"]:
        print(annot["language"])

Expected behavior If no language parameter is set, the api parameter should be left out (and not set to a default parameter).

Please complete the following information:

Versions averbis-python-api 0.1.1, HD 5.36.1

reckart commented 3 years ago

In particular for small texts, the language detection might be off which could be a downer for people trying out the API with some trivial test texts.

How about making the language parameter mandatory and allowing to pass "auto" (or some AUTO constant). We could also consider having a Languages class/enum/constants with e.g. German, English and Auto so people do not have to wonder which values are acceptable as languages.

pkluegl commented 3 years ago

The default values for the language depends on the use case and should be defined in the pipeline, as they are now. Providing a default value on client level overrides the defined functionality of a pipeline. The functionality of the python api should be consistent to the usage of the other api calls, e.g., swagger. It should be possible to provide the language of the document (if it is known), but there should be no defautl value on client side.

@TorstenK91 do you want to provide a patch and open a pull request?

TorstenK91 commented 3 years ago

Yes, I'd like to do the fix myself.

Just to be clear, If we want to mimic the swagger behavior, we neither want the language to be a mandatory parameter (arg), nor do we want to send a default value as the request parameter. So I would propose that we keep it as a named argument (kwarg) in the function, with a default value "auto" (or None), which leads to the request parameter being omitted in the request call.

Regarding a list of accepted values (Languages class/enum/constants), we would need to deal with the case where a pipeline accepts more / different / less languages. So the list (in theory) depends on the pipeline, right? So unless we can retrieve the list from the pipeline, we shouldn't force a fixed list of languages (or auto)?

pkluegl commented 3 years ago

Yes, no argument given => not langauge set
Yes. The set of valid languages for the language detection should be configured in the pipeline. I would not provide any functionality on client side at this point.

reckart commented 3 years ago

The reason I suggested to make it an arg (not kwarg) parameter was to raise awareness of the existence of the language parameter to reduce the risk that people just get bad results because they do not know that they can set a language. It could still remain unset.

The reason for the Python side language list is similar - to facilitate them selecting a language if they care without having to worry if they need to specify "en" or "English" or "en-US" or "eng" or whatever.

DavidHuebner commented 3 years ago

I am re-opening the issue, because we forgot to remove the default language = "de" in the _analyse_text_xmi endpoint (https://github.com/averbis/averbis-python-api/blob/main/averbis/core/_rest_client.py#L1284). I will create a merge request - it is literally just a line of code that needs to be changed.

DavidHuebner commented 3 years ago

Ready to be reviewed (https://github.com/averbis/averbis-python-api/pull/76)

averbis / averbis-python-api

analyzeText default parameter language="de" prevents automatic language detection #8