geopython / pycsw

pycsw is an OGC CSW server implementation written in Python. pycsw fully implements the OpenGIS Catalogue Service Implementation Specification [Catalogue Service for the Web]. Initial development started in 2010 (more formally announced in 2011). The project is certified OGC Compliant, and is an OGC Reference Implementation. pycsw allows for the publishing and discovery of geospatial metadata via numerous APIs (CSW 2/CSW 3, OpenSearch, OAI-PMH, SRU). Existing repositories of geospatial metadata can also be exposed, providing a standards-based metadata and catalogue component of spatial data infrastructures. pycsw is Open Source, released under an MIT license, and runs on all major platforms (Windows, Linux, Mac OS X). Please read the docs at https://pycsw.org/docs for more information.
https://pycsw.org
MIT License
203 stars 155 forks source link

GetRecords handling should not filter records based on typenames value #105

Closed tomkralidis closed 11 years ago

tomkralidis commented 11 years ago

The current behaviour for handling GetRecords.typename is to filter records based on typename before applying any OGC filters to the query. Example:

As is happens, GetRecords queries (with filter or not) should always query against all metadata, based on the typeNames schema, and return all results encoded using the outputSchema (which we already do). This is confirmed w/ @smrAzGS' comments as well as CSW spec authors.

So in the codebase, we need to remove the part of the repository query which initially filters by typename so that the entire repository is searched and not filtered by typenames.

@rclark / @smrAzGS: does this make sense?

smrgeoinfo commented 11 years ago

Tom—makes sense to me. I’d adjust the the ‘As it happens,…’ paragraph to say ‘query against all metadata, based on the typeNames schema, and return all results encoded using the outputSchema’. I’m pretty sure most implementations don’t behave this way, because to actually implement, one has to map the query schema elements from the typeName schema to the schema of each metadata schema used in the catalog, and has to transform from records in any schema stored in the dB to the output schema.

To address this issue, GeoPortal maps incoming metadata record elements to Lucene index elements and to build lucene indexes for each queryable property (I think Geonetwork does the same). GeoPortal doesn’t actually honor the outputSchema parameter. GeoNetwork provides output XSLT’s to transform the xml blob from the dB into the requested outputSchema if it is different from the schema for the XML blob in the dB.

Another solution to the problem is used by Deegree—marshal harvested metadata to a relational dB schema, map incoming requests from whatever typeName schema is supported to SQL against the relational dB, and have routines to build XML output in any supported outputSchema.

As I interpret the CSW spec, if the capabilities list an outputSchema, then the server needs to be able to provide any record in its metadata store in that schema.

steve

From: Tom Kralidis [mailto:notifications@github.com] Sent: Monday, February 11, 2013 10:51 AM To: geopython/pycsw Cc: Stephen Richard Subject: [pycsw] GetRecords handling should not filter records based on typenames value (#105)

The current behaviour for handling GetRecords.typename is to filter records based on typename before applying any OGC filters to the query. Example:

As is happens, GetRecords queries (with filter or not) should always query against all metadata, in any advertised outputschema (which we already do). This is confirmed w/ @smrAzGS https://github.com/smrAzGS ' comments as well as CSW spec authors.

So in the codebase, we need to remove the part of the repository query which initially filters by typename so that the entire repository is searched and not filtered by typenames.

@rclark https://github.com/rclark / @smrAzGS https://github.com/smrAzGS : does this make sense?

— Reply to this email directly or view it on GitHub https://github.com/geopython/pycsw/issues/105 .

Image removed by sender.

tomkralidis commented 11 years ago

Thanks @smrAzGS. Updated. Will have this implemented by end of week.

tomkralidis commented 11 years ago

Hi @smrAzGS thanks for the additional implementation comments. FYI pycsw does it the deegree way, and we write to any outputschema in the same way (in Python, we refuse XSLT). We shred the XML in db columns and keep on hand the actual XML representation, which is used if the outputschema requested is the same as the XML representation in the DB column, and when elementsetname=full (as an early out).

tomkralidis commented 11 years ago

FYI fixed in master and 1.4 branch.