Closed oliviadargel closed 2 years ago
This probably does not make sense, as only using de_core_news_md
or de_core_news_lg
(instead of de_core_news_sm
) results in failing NLP tests.
The md- and lg-model only perform correct if the input is lowercase, which is not nice as the result in Frontend is then shown in lowercase, too. A work around for this would be using the build-in capitalize()-function, a disadvantage still would be that locations with a "-" are not correctly capitalized (e.g. the input "Schleswig-Holstein" would result in "Schleswig-holstein").
Another solution would be, that the input is processed twice by the md- or lg-model, once how the user entered the request and once in lowercase. It is searched for locations in the lower case input (generally like it is done currently) but if a location is found, we use the token with the same index from the user input. This would be more time consuming and therefore should be discussed in the NLP team.
In any case, the use of the md- or lg-model result in at least one failing test, because one location is not recognized that the sm-model recognizes.
[1] Link to all models [2] Short explanation to spaCy models
The NLP team has agreed to keep the small model de_core_news_sm
for the time being.
User story
Acceptance criteria
Definition of done