Open fvicent opened 1 year ago
@fvicent as you already noticed the dc:subject
filter is not backed by a real keyword field, so this is a feature request. What is wrong is the returned error "invalid query syntax". "Unsupported query syntax" would probably be more appropriate.
Would you mind changing the title to "CSW search by keyword (dc:subject) is unsupported"?
@giohappy I am willing to work on this if you are ok with this addition. If I am right, this is somewhat similar to the csw:AnyText
case, whose corresponding field csw_anytext
gets populated whenever a dataset or document is saved, in order to support search via CSW. The same could be done for keywords. Or, maybe better, the repository could build a custom database query when dc:subject
is requested for search.
@fvicent that would be great! I would certainly go for the second option, i.e. manipulating the query executed by the PyCSW backend. I invite you to take a look at the CatalogueBackend since there are already parts related to keywords (and links).
I would move back the label from feature
to bug
since dc:subject
is one of the core CSW queriables.
Pls note that GeoNode declares dc:subject
as a queryable; having GeoNode/pyCSW not accepting such filter is indeed a bug.
Good point @etj, you're right.
I've managed to get this working with Postgres by tweaking the SQL filter (keyword_csv = %s
) generated by pycsw and executed in GeoNodeRepository.query()
:
index 7c951b235..d3eadf485 100644
--- a/geonode/catalogue/backends/pycsw_plugin.py
+++ b/geonode/catalogue/backends/pycsw_plugin.py
@@ -41,6 +41,15 @@ GEONODE_SERVICE_TYPES = {
"urn:x-esri:serviceType:ArcGIS:ImageServer": "ESRI:ArcGIS:ImageServer",
}
+SELECT_KEYWORD_CSV_QUERY = """\
+(SELECT STRING_AGG("base_hierarchicalkeyword"."slug", ',')
+FROM "base_taggedcontentitem"
+INNER JOIN "base_hierarchicalkeyword"
+ON ("base_taggedcontentitem"."tag_id" = "base_hierarchicalkeyword"."id")
+WHERE "base_taggedcontentitem"."content_object_id" = "base_resourcebase"."id"
+GROUP BY "base_resourcebase"."id")\
+"""
+
class GeoNodeRepository(Repository):
"""
@@ -156,6 +165,10 @@ class GeoNodeRepository(Repository):
pycsw_filters = settings.PYCSW.get("FILTER", {"resource_type__in": ["dataset"]})
if "where" in constraint: # GetRecords with constraint
+ constraint["where"] = (
+ constraint["where"]
+ .replace("keyword_csv", SELECT_KEYWORD_CSV_QUERY)
+ )
query = self._get_repo_filter(ResourceBase.objects.filter(**pycsw_filters)).extra(
where=[constraint["where"]], params=constraint["values"]
)
It would be much simpler to use Django's ORM instead of plain SQL, but having filters built and passed by pycsw as SQL code is a serious impediment. I was thinking in something like this, which unfortunately won't work:
index 7c951b235..c48eeb120 100644
--- a/geonode/catalogue/backends/pycsw_plugin.py
+++ b/geonode/catalogue/backends/pycsw_plugin.py
@@ -19,6 +19,7 @@
import logging
+from django.contrib.postgres.aggregates import StringAgg
from django.db import connection
from django.db.models import Max, Min, Count
from django.conf import settings
@@ -156,8 +157,19 @@ class GeoNodeRepository(Repository):
pycsw_filters = settings.PYCSW.get("FILTER", {"resource_type__in": ["dataset"]})
if "where" in constraint: # GetRecords with constraint
- query = self._get_repo_filter(ResourceBase.objects.filter(**pycsw_filters)).extra(
- where=[constraint["where"]], params=constraint["values"]
+ query = (
+ self._get_repo_filter(
+ ResourceBase.objects.filter(**pycsw_filters)
+ )
+ .annotate(
+ keyword_csv=StringAgg(
+ "keywords__slug",
+ delimiter=","
+ )
+ )
+ .extra(
+ where=[constraint["where"]], params=constraint["values"]
+ )
)
else: # GetRecords sans constraint
query = self._get_repo_filter(ResourceBase.objects.filter(**pycsw_filters))
It seems that annotated fields can't be used within .extra()
.
Another option would be to rewrite the query()
method and get rid of Django's ORM stuff, using plain SQL instead. Then building a query like this one would be much easier:
SELECT * FROM (
SELECT "base_resourcebase"."id",
"base_resourcebase"."title",
"base_resourcebase"."resource_type",
STRING_AGG("base_hierarchicalkeyword"."slug", ',') AS keyword_csv
FROM "base_resourcebase"
INNER JOIN "base_taggedcontentitem" ON ("base_resourcebase"."id" = "base_taggedcontentitem"."content_object_id")
INNER JOIN "base_hierarchicalkeyword" ON ("base_taggedcontentitem"."tag_id" = "base_hierarchicalkeyword"."id")
AND "base_resourcebase"."resource_type" = 'layer'
GROUP BY "base_resourcebase"."id"
) q
WHERE keyword_csv = 'something'; -- <--- Filters passed by pycsw
I couldn't manage to get Django's ORM generate an SQL alike.
@giohappy @etj Thoughts on this?
Expected Behavior
You should be able to filter by
dc:subject
(=keywords in pycsw and geonode) within a CSW query.Actual Behavior
A simple query filtering by keyword returns "Invalid query syntax".
Steps to Reproduce the Problem
Make a
GetRecords
request against the CSW endpoint:With the following
payload.xml
:Response:
But the syntax is valid, since replacing
dc:subject
bydc:title
works as expected.This is the exception reported by pycsw:
Which makes sense since
keyword_csv
is no table field but a model property:https://github.com/GeoNode/geonode/blob/f3a490a2c2b38975351a887af720e7b7282b385e/geonode/base/models.py#L1339-L1348
Specifications