Closed jan-niestadt closed 8 months ago
Maybe a better alternative: add a new query language that is just a JSON structure describing the SpanQuery structure to instantiate, e.g. pattlang=jsonq
. This gives us a clean way to play around with all the query features. It would also be easier to create a query builder for this, because it's easier to serialize to/from this JSON structure than CQL.
For example, <s sentiment="happy" /> !containing "whee"
is currently not a valid BlackLab CQL query (because of the !containing
operator). The alternative <s sentiment="happy" /> & !(<s/> containing "whee")
is possible but currently not optimized to the structure suggested by the first query. In jsonq
you could just specify exactly the query structure you want:
{
"type": "posfilter",
"operation": "containing",
"invert": true,
"producer": {
"type": "tag",
"tagName": "s",
"attributes": {
"sentiment": "happy"
}
},
"leftAdjust": 0,
"rightAdjust": 0,
"filter": {
"type": "term",
"term": "whee"
}
}
This would be somewhat implementation-dependent, although usually SpanQuery classes are only added, and if one was ever removed or changed significantly, we could maintain support for its jsonq
syntax, rewriting it to the most obvious modern alternative.
This is now possible on dev. JSON structure for BCQL query is returned in summary.pattern.json
and the same JSON structure can be passed in the patt
parameter as well. See https://inl.github.io/BlackLab/server/rest-api/corpus/parse-pattern/get.html#json-query-structure
(this comment was superseded, see below)
E.g. add an extension function
_posfilter(producer, filter, operation, invert)
that just creates aSpanQueryPositionFilter
. Every query'stoString()
would also be updated to produce a working query, so also_posfilter(...)
in this example.This makes experimentation with new features and optimizations easier, because you can just try out different low-level queries in the user interface and compare the differences in speed an results.
These functions should start with an underscore to reflect that they're not really intended as a stable, user-friendly CQL extension for end users and may change at any time.