digitalfabrik / integreat-cms

Simplified content management back end for the Integreat App - a multilingual information platform for newcomers
https://digitalfabrik.github.io/integreat-cms/
Apache License 2.0
56 stars 33 forks source link

Error at retrieving HIX value when the page content has only a video #2916

Closed MizukiTemma closed 1 month ago

MizukiTemma commented 1 month ago

Describe the Bug

HIX value cannot be retrieved when a page has only a video in its content.

Steps to Reproduce

  1. Go to Testumgebung in the test system.
  2. Go to the page "Integreat in Gebärdensprache (Video)"
  3. See the error " HIX value could not be calculated. Please try again later."

This is locally reproducable. Copy the source code of the page and paste it in a page in the local system, you"ll see HIX benchmark API call failed: <HTTPError 400: 'Bad Request'>

Expected Behavior

No error

Actual Behavior

Error appears

Additional Information

See also #2917 If some words are added into the content, HIX value is retrieved and saved successfully.

Traceback ``` ```
MizukiTemma commented 1 month ago

Solution from the issue grooming: do not send the text to TextLab if the page doesn not contain real text.

PeterNerlich commented 1 month ago

I think we could use the combination of two conditions to determine whether we regard a page as empty, only non-textual content or content to evaluate by TextLab: