CentreForDigitalHumanities / I-analyzer

The great textmining tool that obviates all others
https://ianalyzer.hum.uu.nl
MIT License
7 stars 2 forks source link

Timeline visualisation in DBNL broken #1683

Closed lukavdplas closed 3 weeks ago

lukavdplas commented 4 weeks ago

What went wrong?

I tried using the "number of results" query in the DBNL corpus. This works when you compare by periodical, author, etc., but not by year. The visualisation does not render and keeps showing a loading spinner.

The request for visualisation data returns 500 status (i.e. internal server error). Excerpt from the server logs:

ERROR:es.views:BadRequestError(400, 'x_content_parse_exception', '[1:55] [terms] failed to parse field [size]')
Traceback (most recent call last):
  File ".../backend/es/views.py", line 62, in post
    results = client.search(
  File ".../python3.9/site-packages/elasticsearch/_sync/client/utils.py", line 414, in wrapped
    return api(*args, **kwargs)
  File ".../python3.9/site-packages/elasticsearch/_sync/client/__init__.py", line 3863, in search
    return self.perform_request(  # type: ignore[return-value]
  File ".../python3.9/site-packages/elasticsearch/_sync/client/_base.py", line 320, in perform_request
    raise HTTP_EXCEPTIONS.get(meta.status, ApiError)(
elasticsearch.BadRequestError: BadRequestError(400, 'x_content_parse_exception', '[1:55] [terms] failed to parse field [size]')
ERROR:django.request:Internal Server Error: /api/es/dbnl/_search

What did you expect to happen?

The visualisation should be rendered without issue.

Also, if the request for data fails, the visualisation should show an error message.

Screenshot

screenshot of I-analyzer showing a loading spinner

Where did you find the bug?

Version

5.13.0

Steps to reproduce

Go to https://ianalyzer.hum.uu.nl/search/dbnl?query=test&tab=visualizations&visualize=resultscount&visualizedField=year

lukavdplas commented 4 weeks ago

Looks like the issue is the following:

If the "number of results" visualisation is used in a field with a range filter (an integer or float field), the number of bins is read from the lower and upper bound in the filter settings.

However, bounds for range filters are now optional, but the query isn't created correctly if the bounds are not specified. The DBNL corpus does not give a lower/upper bound, hence the issue.

By the way, the formula is essentially size = upper - lower which will also cause issues with floating numbers (e.g. if all values are between 0 and 1).