AtlasOfLivingAustralia / ecodata

Data capture webservices supporting Biocollect and other apps
https://ecodata.ala.org.au/documentation/index
8 stars 15 forks source link

Elasticsearch error when user query contains an Elasticsearch reserved character #888

Open cofiem opened 10 months ago

cofiem commented 10 months ago

Hello,

When a user enters a search in the project or organisation list pages, if the query contains punctuation (essentially, anything in the Elasticsearch reserved characters list), Elasticsearch throws an error.

For example, a search for ] throws this error:

Nov 14 10:34:09 bash[2638101]: 2023-11-14 10:34:09.074 DEBUG --- [0.0-8718-exec-6] au.org.ala.ecodata.ElasticSearchService  : search params: [offset:0, max:10, query:], fq:className:au.org.ala.ecodata.Organisation, highlight:true, flimit:999, controller:search, action:elastic]
Nov 14 10:34:09 bash[2638101]: 2023-11-14 10:34:09.074 DEBUG --- [0.0-8718-exec-6] au.org.ala.ecodata.ElasticSearchService  : filters = className:au.org.ala.ecodata.Organisation; flimit = 999
Nov 14 10:34:09 bash[2638101]: 2023-11-14 10:34:09.093 ERROR --- [0.0-8718-exec-6] StackTrace                               : Full Stack Trace:
Nov 14 10:34:09 bash[2638101]: org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=search_phase_execution_exception, reason=all shards failed]
Nov 14 10:34:09 bash[2638101]:         at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:178)
Nov 14 10:34:09 bash[2638101]:         at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:2484)
Nov 14 10:34:09 bash[2638101]:         at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:2461)
Nov 14 10:34:09 bash[2638101]:         at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:2184)
Nov 14 10:34:09 bash[2638101]:         at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:2137)
Nov 14 10:34:09 bash[2638101]:         at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:2105)
Nov 14 10:34:09 bash[2638101]:         at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:1367)
Nov 14 10:34:09 bash[2638101]:         at org.elasticsearch.client.RestHighLevelClient$search$3.call(Unknown Source)
Nov 14 10:34:09 bash[2638101]:         at au.org.ala.ecodata.ElasticSearchService.search(ElasticSearchService.groovy:1420)
Nov 14 10:34:09 bash[2638101]:         at au.org.ala.ecodata.ElasticSearchService.search(ElasticSearchService.groovy)
Nov 14 10:34:09 bash[2638101]:         at au.org.ala.ecodata.ElasticSearchService$search$12.call(Unknown Source)
Nov 14 10:34:09 bash[2638101]:         at au.org.ala.ecodata.SearchController.elastic(SearchController.groovy:57)
[..snip..]
Nov 14 10:34:09 bash[2638101]:         Suppressed: org.elasticsearch.client.ResponseException: method [POST], host [http://localhost:9200], URI [/search/_search?typed_keys=true&max_concurrent_shard_requests=5&search_type=dfs_query_then_fetch&batched_reduce_size=512], status line [HTTP/1.1 400 Bad Request]
Nov 14 10:34:09 bash[2638101]: {"error":{"root_cause":[{"type":"query_shard_exception","reason":"Failed to parse query []]","index_uuid":"4RfEsukVTeisdgleTkFCTw","index":"production_search_1"}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"production_search_1","node":"rApGSGVLSFWM6t505WIh4A","reason":{"type":"query_shard_exception","reason":"Failed to parse query []]","index_uuid":"4RfEsukVTeisdgleTkFCTw","index":"production_search_1","caused_by":{"type":"parse_exception","reason":"Cannot parse ']': Lexical error at line 1, column 2.  Encountered: <EOF> after : \"\"","caused_by":{"type":"token_mgr_error","reason":"Lexical error at line 1, column 2.  Encountered: <EOF> after : \"\""}}}}]},"status":400}
Nov 14 10:34:09 bash[2638101]:                 at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:342)
Nov 14 10:34:09 bash[2638101]:                 at org.elasticsearch.client.RestClient.performRequest(RestClient.java:312)
Nov 14 10:34:09 bash[2638101]:                 at org.elasticsearch.client.RestClient.performRequest(RestClient.java:287)
Nov 14 10:34:09 bash[2638101]:                 at org.elasticsearch.client.RestHighLevelClient.performClientRequest(RestHighLevelClient.java:2699)
Nov 14 10:34:09 bash[2638101]:                 at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:2171)
Nov 14 10:34:09 bash[2638101]:                 ... 75 common frames omitted

On the Biocollect Organisation list page (<domain>/organisation/list), this results in a js error Uncaught TypeError: c.hits is undefined, when trying to paginate the results.

This is in Ecodata v3.3 and Biocollect v5.2.6, however looking at the code for the current versions (v4.3, v6.6.5), it looks like this is still a problem.

QueryStringQueryBuilder qsQuery = queryStringQuery(query) client.search(request, RequestOptions.DEFAULT)

I am not entirely sure what should be done about this. The obvious thing would be to treat all queries as 'opaque' and escape all reserved characters, however, I can't find a built-in Elasticsearch API that does that. It is possible to manually escape the characters by replacing any instance of the reserved characters with \<char>.

What do you think? Have you seen this issue?