gbif / occurrence

Occurrence store, download, search
Apache License 2.0
22 stars 15 forks source link

Predicate support for free text queries #288

Open MortenHofft opened 2 years ago

MortenHofft commented 2 years ago

Prompted by https://github.com/gbif/hosted-portals/issues/225

For maps on hosted portals we use the option to post a predicate to the map service map/occurrence/adhoc/predicate/ and get a token back. That token is then used for map tiles. But since predicates do not support free text queries, then we cannot visualize maps with free text q filters. That is a shame.

One way so solve this is to add a new predicate type for free text. But only support it for search and maps. Not for downloads.

Something along

"predicate":{
  "type":"fullTextSearch",
  "value":"dog"
}
MattBlissett commented 2 years ago

https://github.com/gbif/occurrence/blob/6c1b75c886e563d3bd70e4ce69be3de7e9e074c2/occurrence-heatmaps/src/main/java/org/gbif/occurrence/search/heatmap/OccurrenceHeatmapRequestProvider.java#L70-L89

As implemented, the full-text-search q parameter can already be supplied:

https://api.gbif.org/v2/map/occurrence/adhoc/2/2/0@2x.png?style=scaled.circles&mode=GEO_CENTROID&srs=EPSG%3A3857&squareSize=256&predicateHash=-280168075

https://api.gbif.org/v2/map/occurrence/adhoc/2/2/0@2x.png?style=scaled.circles&mode=GEO_CENTROID&srs=EPSG%3A3857&squareSize=256&predicateHash=-280168075&q=Greece

@fmendezh, was it your intention for queries to work like this?

timrobertson100 commented 2 years ago

I was pinged on slack to comment on this. I think our options here are:

  1. Add fullTextSearch as a predicate as proposed in this issue, throwing an IAE in the download API if presented with a predicate containing this filter (SQL driven downloads don't currently support full-text search)
  2. Change the react libs to handle the q param as seemingly supported in the adhoc map API (what @MattBlissett writes in the comment above)
  3. Expose a new endpoint in maps that accepts an ES query directly and executes it, simply rendering the result

Of these 2. seems like the most convoluted both for the client implementation, and the API design (a query predicate and an additional query parameter that is to be merged in).

Option 1. seems a reasonable approach to me. The predicate model was originally intended to abstract the query format from the underlying tech (Solr/ES and SQL) for downloads but isn't strictly tied to it - it's just the syntax used for encoding queries. Fundamentally we have a search system that supports more than the download system and have APIs that allow the predicate syntax to hit both. It's not ideal but doesn't seem too drastic that the download API rejects some queries as unsupported. We don't document or promote using the predicate syntax format for search for good reason, so I wouldn't suggest documenting this capability for public use if it is added.

Option 3. also seems reasonable. It exposes internals, but we already have a GraphQL API knowledgeable of the ES schema so that component is taking responsibility for translating incoming queries to ES searches. Since it does that for the table views, it's reasonable it also takes responsibility for the ES query needed by the VectorTile server to format the search response. If we preferred not to expose the ES syntax some server-side state could hide it similar to how we do in other places (e.g. the GraphQL component could register the ES query with the VT server, provide the client with a token and then the client hit the VT server with that token).

@MortenHofft - did I capture the options correctly please?