geopython / pycsw

pycsw is an OGC CSW server implementation written in Python. pycsw fully implements the OpenGIS Catalogue Service Implementation Specification [Catalogue Service for the Web]. Initial development started in 2010 (more formally announced in 2011). The project is certified OGC Compliant, and is an OGC Reference Implementation. pycsw allows for the publishing and discovery of geospatial metadata via numerous APIs (CSW 2/CSW 3, OpenSearch, OAI-PMH, SRU). Existing repositories of geospatial metadata can also be exposed, providing a standards-based metadata and catalogue component of spatial data infrastructures. pycsw is Open Source, released under an MIT license, and runs on all major platforms (Windows, Linux, Mac OS X). Please read the docs at https://pycsw.org/docs for more information.
https://pycsw.org
MIT License
206 stars 155 forks source link

ogc api records 404 if uuid contains '/' or '%2F' #848

Open pvgenuchten opened 1 year ago

pvgenuchten commented 1 year ago

Description

Some communities tend to place a doi in metadata identifier e.g. 10.5281/zenodo.4088113 If you navigate to this item using /collections/metadata:main/items/10.5281/zenodo.4088113, then a 404 is returned, same error occurs with /collections/metadata:main/items/10.5281%2Fzenodo.4088113 (urlencoded).

I kind of expected option 2 to work fine, in which case we would have to make sure we always encode the '/' to %2F, alternatives could be to prevent '/' in identifiers by substituting to '-' or throw an error on insert.

pvgenuchten commented 7 months ago

this problem actually exists on the demo server https://demo.pycsw.org/gisdata/collections/metadata:main/items/http%3A%2F%2Fcapita.wustl.edu%2FDataspaceMetadata_ISO%2FCIRA.VIEWS.BRf.xml

I was curious if the problem exists also on pygeoapi, but there it seems covered, maybe it is considered by the flask api?

pvgenuchten commented 4 months ago

seems the problem still exists:

tomkralidis commented 2 months ago

@pvgenuchten I cannot reproduce this issue locally. I've tried inspecting on demo.pycsw.org directly, and found the following.

Given a URL like:

https://demo.pycsw.org/gisdata/collections//metadata:main/items/http://capita.wustl.edu/DataspaceMetadata_ISO/CIRA.VIEWS.MF.xml

And the below pycsw container logs on demo.pycsw.org:

[2024-08-03T19:16:39Z] {/home/pycsw/pycsw/pycsw/ogc/api/records.py:837} DEBUG - Querying repository for item http:/capita.wustl.edu/DataspaceMetadata_ISO/CIRA.VIEWS.MF.xml

Here, we see that http://capita.wustl.edu/DataspaceMetadata_ISO/CIRA.VIEWS.MF.xml is getting converted to http:/capita.wustl.edu/DataspaceMetadata_ISO/CIRA.VIEWS.MF.xml.

I tried adjusting the nginx setup on demo.pycsw.org withmerge_slashes: off; but no luck.

In this case I would say things are working at the application level as expected.

cc @kalxas @ricardogsilva

pvgenuchten commented 1 month ago

should we close it, or wait for re-configuration on demo server? @kalxas