SAEON / odp-server

Source code for the SAEON Open Data Platform server components.
GNU Affero General Public License v3.0
0 stars 2 forks source link

Request for 'updatedAt' timestamps at the /api/catalogue endpoint #2

Closed zachsa closed 1 year ago

zachsa commented 1 year ago

Hi @marksparkza,

Looking at the docs for the endpoint (https://odp.saeon.ac.za/api/catalogue) to retrieve catalogue data, it doesn't look like there is the ability to specify a time range/order by updatedAt/or something similar.

I would like to be able to filter based on an updatedAt timestamp - i.e. give me all the records that have been updated in the last 30 days. One way this could look would be GET https://odp.saeon.ac.za/api/catalogue?last=30days

With this implemented, I wouldn't have to re-integrate the whole catalogue every day. This isn't a huge problem since currently the integration takes less than 30 seconds for +/- 4 000 docs, and will likely be around 300 sec for 40 000 docs, And about 1 hour for 400 000 docs. Or thereabouts.

I've updated the integration so that the elasticsearch index is not deleted and recreated for every integration, so if this is implemented on the ODP side I will immediately be able to use it. https://github.com/SAEON/data-portal/issues/65

marksparkza commented 1 year ago

Hi Zach,

I will implement this in the v2 API.

zachsa commented 1 year ago

Okay cool. Thanks

marksparkza commented 1 year ago

Hi Zach,

I've implemented this as an updated_since query parameter, which takes a date of the form 2023-05-01.

You can try it out on dev - for example if you use the above date (along with include_retracted=true) you should get just 4 records, 2 updated and 2 retracted.

Note also the /api/catalog/*/records supports an include_nonsearchable param which should be set to true for the SAEON catalogue. This will include all the Obs DB instrument records, which you'll see have searchable: false and should be excluded from search results but still resolve if linked to directly.