geopython / pycsw

pycsw is an OGC CSW server implementation written in Python. pycsw fully implements the OpenGIS Catalogue Service Implementation Specification [Catalogue Service for the Web]. Initial development started in 2010 (more formally announced in 2011). The project is certified OGC Compliant, and is an OGC Reference Implementation. pycsw allows for the publishing and discovery of geospatial metadata via numerous APIs (CSW 2/CSW 3, OpenSearch, OAI-PMH, SRU). Existing repositories of geospatial metadata can also be exposed, providing a standards-based metadata and catalogue component of spatial data infrastructures. pycsw is Open Source, released under an MIT license, and runs on all major platforms (Windows, Linux, Mac OS X). Please read the docs at https://pycsw.org/docs for more information.
https://pycsw.org
MIT License
209 stars 155 forks source link

[records] sort by updated #1033

Open pvgenuchten opened 1 month ago

pvgenuchten commented 1 month ago

Description

When sorting on '-updated' on ogcapi-records, a random order is returned, maybe due to a combination of sorting on modification_date and date_revision or date?

On demo the effect is less obvious because only 2 dates exist, but https://demo.pycsw.org/gisdata/collections/metadata:main/items?sortby=-updated&offset=100 returns same order as https://demo.pycsw.org/gisdata/collections/metadata:main/items?sortby=updated&offset=100

Steps to Reproduce

load some data with various dates populated, oreder by updated

Additional Information

relevant code seems https://github.com/geopython/pycsw/blob/c532be3aba731bfee093620c40f64160c3901f8e/pycsw/core/repository.py#L151 insert-date is the same for most of my content, because i imported all records at a single moment, expected would be that the modified-date of the metadata record itself was used (or if empty, insert-date), this logic is also applied in https://github.com/geopython/pycsw/blob/c532be3aba731bfee093620c40f64160c3901f8e/pycsw/ogc/api/records.py#L1177, which explains the difference

the problem here is that date-modified is not always populated, can we populate it at insert if nill?

alternative solution display dataset date and order by dataset date (which is usually more relevant, but not always populated)