ckan / ckanext-spatial

Geospatial extension for CKAN
http://docs.ckan.org/projects/ckanext-spatial
126 stars 193 forks source link

Remove coupled resources from solr #285

Closed Zharktas closed 2 years ago

Zharktas commented 2 years ago

Coupled resources are a list of resources which link CSW server url and UUID together. These are not needed in SOLR as they hardly are searchable and might result in solr indexing error if single harvested dataset has many of them in it. This is based on discussion on https://github.com/ckan/ckan/pull/4825

amercader commented 2 years ago

This looks good @Zharktas, should we also pop the actual spatial field? This is a big geojson blob that is not really needed in Solr either

FuhuXia commented 2 years ago

@amercader Spatial Search with Solr backend is relying on the GeoJSON data in spatialfield. We should not pop it. http://docs.ckan.org/projects/ckanext-spatial/en/latest/spatial-search.html

amercader commented 2 years ago

@FuhuXia the plugin uses the spatial field to do the necessary calculations and index the relevant solr fields (eg bbox_area, maxx when using the solr backend and spatial_geom when using solr-spatial-field one) but it doesn't require the actual spatial field contents to be indexed in Solr

Zharktas commented 2 years ago

@amercader done

Zharktas commented 2 years ago

I noticed that the fields are still index under extras_coupled-resource and extras_spatial, but since extras_ is type text in the schema, indexing doesn't fail. However should those be removed as well as it not really that useful data to be searched on ?

amercader commented 2 years ago

@Zharktas sorry I missed this. I'm not sure we can "remove" extras_* fields on the before_index() hook as these are dynamic fields in Solr. However if coupled-resource and spatial are removed before indexing the extras_* variant won't be created by Solr right?