geopython / pycsw

pycsw is an OGC CSW server implementation written in Python. pycsw fully implements the OpenGIS Catalogue Service Implementation Specification [Catalogue Service for the Web]. Initial development started in 2010 (more formally announced in 2011). The project is certified OGC Compliant, and is an OGC Reference Implementation. pycsw allows for the publishing and discovery of geospatial metadata via numerous APIs (CSW 2/CSW 3, OpenSearch, OAI-PMH, SRU). Existing repositories of geospatial metadata can also be exposed, providing a standards-based metadata and catalogue component of spatial data infrastructures. pycsw is Open Source, released under an MIT license, and runs on all major platforms (Windows, Linux, Mac OS X). Please read the docs at https://pycsw.org/docs for more information.
https://pycsw.org
MIT License
209 stars 155 forks source link

POST of GeoJSON with no 'links' makes catalogue fail #1035

Open spinto opened 1 month ago

spinto commented 1 month ago

Description

If you POST a GeoJSON to the new transaction endpoint /collections/{collectionId}/items which does not contain a links field, like

{
  "type": "Feature",
  "stac_version": "1.0.0",
  "stac_extensions": [
    "https://stac-extensions.github.io/alternate-assets/v1.1.0/schema.json",
    "https://stac-extensions.github.io/storage/v1.0.0/schema.json"
  ],
  "id": "S3A_OPER_AUX_GNSSRD_POD__20171212T193142_V20160223T235943_20160224T225600",
  "properties": {
    "datetime": "2015-05-19T12:00:00.000000Z"
  },
  "assets": {
    "PRODUCT": {
      "href": "AUX/AUX_GNSSRD/2016/02/24/S3A_OPER_AUX_GNSSRD_POD__20171212T193142_V20160223T235943_20160224T225600",
      "title": "Product",
      "type": "application/octet-stream"
    }
  }
}

the product will be correctly ingested (call returns 201 created) but the catalogue will then fail to retreive items, failing with the error:

[2024-10-21T12:53:39Z] {/usr/local/lib/python3.10/site-packages/flask/app.py:838} ERROR - Exception on /collections/metadata:main/items [GET]
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1473, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 882, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 880, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 865, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)  # type: ignore[no-any-return]
  File "/home/pycsw/pycsw/pycsw/wsgi_flask.py", line 220, in items
    return get_response(api_.items(dict(request.headers),
  File "/home/pycsw/pycsw/pycsw/ogc/api/records.py", line 730, in items
    response['features'].append(record2json(record, self.config['server']['url'], collection, self.mode))
  File "/home/pycsw/pycsw/pycsw/ogc/api/records.py", line 1128, in record2json
    rec['links'].append(value)
KeyError: 'links'

NOTE: The GeoJSON above is indeed invalid, as links although it can be empty, it is mandatory, but the catalogue should not accept it, or tolerate it and assume it empty "links": []. With the current behavior instead, a bad GeoJSON pushed by mistake can make the entire catalogue query fail.

Environment

geopython/pycsw:eoepca-2.0.0-beta1

Steps to Reproduce

POST to /collections/{collectionId}/items the following GeoJSON:

{
  "type": "Feature",
  "stac_version": "1.0.0",
  "stac_extensions": [
    "https://stac-extensions.github.io/alternate-assets/v1.1.0/schema.json",
    "https://stac-extensions.github.io/storage/v1.0.0/schema.json"
  ],
  "id": "S3A_OPER_AUX_GNSSRD_POD__20171212T193142_V20160223T235943_20160224T225600",
  "properties": {
    "datetime": "2015-05-19T12:00:00.000000Z"
  },
  "assets": {
    "PRODUCT": {
      "href": "AUX/AUX_GNSSRD/2016/02/24/S3A_OPER_AUX_GNSSRD_POD__20171212T193142_V20160223T235943_20160224T225600",
      "title": "Product",
      "type": "application/octet-stream"
    }
  }
}

then try to perform a GET items from the catalogue. An Internal Server Error will appear.

If you then delete the item, the errors disappears

Additional Information

Catalogue should check the validity of the GeoJSON before ingesting it, and check links fields exists or assume it empty