Closed ansell closed 1 year ago
These queries also have the potential to affect bie-index
performance through calls to
2019-01-04 11:19:46,389 [http-bio-8080-exec-373] INFO au.org.ala.biocache.service.SpeciesLookupRestService (SpeciesLookupRestService.java:396) - Requesting: https://bie.ala.org.au/ws/guid/batch?q=*
Another outage today was caused by this qid, which might as well be requesting everything even though it has two basic filters on it that don't cut it down very far https://biocache-ws.ala.org.au/ws/qid/1559327224101
Another short outage today caused by a similar query https://biocache-ws.ala.org.au/ws/qid/1567141555695
Another outage today caused by text:*
Also need to configurably block all of these generic spatial portal filters unless other more specific filters are included as biocache-service/solr cannot handle them and people need to be taught to be more specific in their queries:
https://biocache-ws.ala.org.au/ws/qid/1585004472884
{
"rowKey": "1585004472884",
"q": "geospatial_kosher:true",
"displayString": "Spatial validity:Spatially valid",
"wkt": "",
"lastUse": 1585101191084,
"fqs": [
"-occurrence_status_s:absent",
"longitude:[112 TO 154]",
"latitude:[-44 TO -9]"
],
"maxAge": -1
}
Also need a way of configurably blocking bogus queries caused by other ALA products that have coding bugs which are not being fixed in the relevant products such as the following:
text:undefined
text:null
qid:undefined
qid:null
@alexhuang091 this issue is relevant to my post on Slack about NOT failing-over to *:*
when no q
param is detected.
I think a search of code for *:*
might be needed to find all cases.
Also note the configurably
element to this change - needs to be implemented via feature flag so LA users can turn it off or we can back it out via config.
Is q='*:*'
the only way to search for all so we only need to handle this case? I mean what else can be in q
to search for all
apart from
text:undefined
text:null
qid:undefined
qid:null
which effectively returns large amount of records
I think a first change could be to change the way biocache currently handles an empty q
param - which is to substitute in *:*
for it. I think we do want allow users to enter *:*
as there are reports on the dashboard which use this and blocking it would break those.
This would need changes in both biocache-service and biocache-hubs, as they both have code that does that substitution. Searching GH code for *:*
is a good place to start.
So those cases above - I think we'd want to invalidate those requests and push back a BAD_REQUEST http code.
I think a first change could be to change the way biocache currently handles an empty
q
param - which is to substitute in*:*
for it. I think we do want allow users to enter*:*
as there are reports on the dashboard which use this and blocking it would break those.This would need changes in both biocache-service and biocache-hubs, as they both have code that does that substitution. Searching GH code for
*:*
is a good place to start.So those cases above - I think we'd want to invalidate those requests and push back a BAD_REQUEST http code.
Hi Nick, biocache-service
doesn't substitute empty q
with *:*
,
Commented out this line in biocache-hubs
causing q=''
in searchParams
if (!requestParams.q) {
// requestParams.q = "*:*"
}
The result is this:
Bai found an issue where the q
param was getting stripped off some POST calls to biocache-service (between SP sending it and arriving at biocache-service, via Nginx) and the results from biocache-service was to show all records, so it must be happening there somewhere. Note: hubs not involved at all in that case.
Here we go https://biocache-ws.ala.org.au/ws/occurrences/search?pageSize=9
returns all records ad shows query: "?q=*%3A*",
in response.
EDIT: I think SOLR might be doing this via the defaultQuery
option... in ansible
Hi Nick,
@RequestMapping(value = {"/occurrences/search.json*","/occurrences/search*"}, method = RequestMethod.GET)
public @ResponseBody SearchResultDTO occurrenceSearch(SpatialSearchRequestParams requestParams,
@RequestParam(value="apiKey", required=false) String apiKey,
If there's no q
in the url, default *:*
will be used. This happens upon service receiving the request.
Try this https://biocache-ws.ala.org.au/ws/occurrences/search?pageSize=9&q=
Biocache-hubs is a bit special, it has its own definition of SearchRequestParams
and q=''
is default, and it replaces any '' with *:*
, so Biocache-hubs never sends empty q,
So the conclusion is:
*:*
will be used.Agree. So I think we need 2 changes:
*:*
in biocache-hubs defaultQuery
, so if no q
, then we DON'T substitute *:*
.No longer an issue.
We cannot support many concurrent all-records queries because of infrastructure budget limitations. One source of long queries are queries for everything
text:*
,q=*
,q=*:*
. It would be useful to optionally blacklist these queries inbiocache-service
to prevent them being executed instead of letting them get to the timeout and users fail to get results in those cases anyway. User interfaces could be configured to detect these calls and reject them preemptively, but there are many UIs directly accessingbiocache-service
, so the blacklisting needs to be done at this level to be effective.