Open epifanio opened 10 months ago
@ferrighi @magnarem
Assuming I want to execute the following query:
field_to_query
= text_a
in bbox_1
OR
field_to_query
= text_b
in bbox_2
<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<csw:GetRecords xmlns:apiso="http://www.opengis.net/cat/csw/apiso/1.0" xmlns:csw="http://www.opengis.net/cat/csw/2.0.2" xmlns:ogc="http://www.opengis.net/ogc" service="CSW" version="2.0.2" resultType="results" startPosition="1" maxRecords="5" outputFormat="application/xml" outputSchema="http://www.isotc211.org/2005/gmd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/cat/csw/2.0.2 http://schemas.opengis.net/csw/2.0.2/CSW-discovery.xsd" xmlns:gml="http://www.opengis.net/gml" xmlns:gmd="http://www.isotc211.org/2005/gmd">
<csw:Query typeNames="gmd:MD_Metadata">
<csw:ElementSetName>brief</csw:ElementSetName>
<csw:Constraint version="1.1.0">
<ogc:Filter>
<ogc:Or>
<ogc:And>
<ogc:PropertyIsLike wildCard="*" singleChar="?" escapeChar="\\" matchCase="false">
<ogc:PropertyName>dc:{field_to_query}}</ogc:PropertyName>
<ogc:Literal>text_a</ogc:Literal>
</ogc:PropertyIsLike>
<ogc:BBOX>
<ogc:PropertyName>apiso:BoundingBox</ogc:PropertyName>
<gml:Envelope>
<gml:lowerCorner>{lowerCorner_1}</gml:lowerCorner>
<gml:upperCorner>{upperCorner_1}</gml:upperCorner>
</gml:Envelope>
</ogc:BBOX>
</ogc:And>
<ogc:And>
<ogc:PropertyIsLike wildCard="*" singleChar="?" escapeChar="\\" matchCase="false">
<ogc:PropertyName>dc:{field_to_query}</ogc:PropertyName>
<ogc:Literal>leaf</ogc:Literal>
</ogc:PropertyIsLike>
<ogc:BBOX>
<ogc:PropertyName>apiso:BoundingBox</ogc:PropertyName>
<gml:Envelope>
<gml:lowerCorner>{lowerCorner_2}</gml:lowerCorner>
<gml:upperCorner>{upperCorner_2}</gml:upperCorner>
</gml:Envelope>
</ogc:BBOX>
</ogc:And>
</ogc:Or>
</ogc:Filter>
</csw:Constraint>
</csw:Query>
</csw:GetRecords>
What the equivalent SOLR syntax will look like? Will it be something like:
{
"q": "*:*",
"q.op": "OR",
"start": 0,
"rows": "5",
"fq": [
"metadata_status:Active",
"collection:ADC",
"field_to_query:(text_a text_b)",
"{!field f=bbox score=overlapRatio}Within(ENVELOPE_1)"
"{!field f=bbox score=overlapRatio}Within(ENVELOPE_2)"
]
}
For searching keywords for different fields, the syntax is fieldname:querystring
.
Example "q": "title:ice abstract:core"
will find documents that have ice
in the title and core
in the abstract.
the q.op
will then do OR or AND of this two query fields.
So it will be something like this:
{
"q": " field_to_query:(text_a text_b)",
"q.op": "OR",
"start": 0,
"rows": "5",
"fq": [
"metadata_status:Active",
"collection:ADC",
"{!field f=bbox score=overlapRatio}Within(ENVELOPE_1)"
"{!field f=bbox score=overlapRatio}Within(ENVELOPE_2)"
]
}
I am a bit unsure on how the bbox filters will work..I will check a bit.
So after investigating a bit more, the correct query for the cws query in this issue will be:
{
"q": " (title:wind && _query_:"{!field f=bbox}Within(ENVELOPE(13.50,20.24,78.03,76.48))") || (abstract:ice && _query_:"{!field f=bbox}Within(ENVELOPE(17.45,28.63,80.92,78.32))")",
"q.op": "OR",
"start": 0,
"rows": "5",
"fq": [
"metadata_status:Active",
"collection:ADC",
]
}
This will return all documents that have the word wind
in the title
-field and are inside the bounding box ENVELOPE(13.50,20.24,78.03,76.48)
AND also return all documents that have the word ice
in the abstract
-field and are inside the bounding box
ENVELOPE(17.45,28.63,80.92,78.32)
So in more pseudo code:
{
"q": "(<FIELD_TO_QUERY>:<TEXT_A> && _query_:"{!field f=bbox}Within(<ENVELOPE_1>)") || (<FIELD_TO_QUERY>:<TEXT_B> && _query_:"{!field f=bbox}Within(ENVELOPE(<ENVELOPE_2>)")",
"q.op": "OR",
"start": 0,
"rows": "5",
"fq": [
"metadata_status:Active",
"collection:ADC",
]
}
The &&
can be replaced by AND
and ||
replaced by OR
. A matter of taste.
@magnarem
I have tested both query:
{
"q": "(title:protected && _query_:\"{!field f=bbox score=overlapRatio}Within(ENVELOPE(60.0,90.0,180.0,0.0))\") OR (title:leaf && _query_:\"{!field f=bbox score=overlapRatio}Within(ENVELOPE(65.0,90.0,180.0,0.0))\")",
"q.op": "OR",
"start": 0,
"rows": "5",
"fq": [
"metadata_status:Active",
"collection:(ADC)"
]
}
and:
{
"q": "*:*",
"q.op": "OR",
"start": 0,
"rows": "5",
"fq": [
"metadata_status:Active",
"collection:ADC",
"title:(protected leaf)",
"{!field f=bbox score=overlapRatio}Within(ENVELOPE(65.0,90.0,180.0,0.0))",
"{!field f=bbox score=overlapRatio}Within(ENVELOPE(60.0,90.0,180.0,0.0))"
]
}
they both return the same results [1 record] can you confirm the 2 query above are equivalent?
@epifanio. The queries give the same result, but are not possible the same.
See here for difference betweeen q
parameter and fq
parameter.
So it is the first query when you add the query to the q
parameter, that logically is most equal the csv-xml
-query.
However, this example is not so good, because there are no document in the index that matches the second part of the query:
(title:leaf && _query_:"{!field f=bbox score=overlapRatio}Within(ENVELOPE(65.0,90.0,180.0,0.0))")
(http://SOLR/solr/adc/select?debugQuery=true&fl=id%2Ctitle&fq=collection%3A(ADC)&fq=metadata_status%3AActive&indent=true&q.op=OR&q=(title%3Aleaf%20%26%26%20_query_%3A%22%7B!field%20f%3Dbbox%20score%3DoverlapRatio%7DWithin(ENVELOPE(65.0%2C90.0%2C180.0%2C0.0))%22)&rows=5)
So it is not really a way to check the difference, since it only matches the first part of the query.
I've implemented the code for both, so I will prioritize the sequence of field
AND bbox
joined by OR in the main q parameter
like in:
{
"q": "(title:protected && _query_:\"{!field f=bbox score=overlapRatio}Within(ENVELOPE(60.0,90.0,180.0,0.0))\") OR (title:leaf && _query_:\"{!field f=bbox score=overlapRatio}Within(ENVELOPE(65.0,90.0,180.0,0.0))\")",
"q.op": "OR",
"start": 0,
"rows": "5",
"fq": [
"metadata_status:Active",
"collection:(ADC)"
]
}
We need to add support for multiple query like: