caltechlibrary / irdmtools

A Go and Python package for working with InvenioRDM repositories.
https://caltechlibrary.github.io/irdmtools
Other
1 stars 1 forks source link

implement pg query for retrieving keys #56

Closed rsdoiel closed 11 months ago

rsdoiel commented 11 months ago

The OAI-PMH API is too slow to use for key retrieval. It should only be a fallback. I need to add support for querying Postgres directly for record ids (GetRecordsIds and GetModifiedRecordIds). Since irdmtools maybe working with both MySQL (EPrints) and Postgres (RDM) I need to add explicit DB variables to setup connects.

For rdmutil it can check if DB connection is define then directly query Postgres for the ids.

rsdoiel commented 11 months ago

Implemented Postgres query option for GetRecordIds, will need to sort out what fields I need to use to create the last id list for modified/created records in a time range.

Query for all records ids


SELECT COUNT(json->>'id')  FROM rdm_records_metadata WHERE json->'access'->>'record' = 'public';
``

I'm assuming I should never return restricted records. I could add an option for that and adjust the query accordingly.
rsdoiel commented 11 months ago

This will be available in the upcoming v0.0.55 release. Still evaluating if using the REST API for JSON object retrieve is fast enough. Most of what we need is in the rdm_records_metadata.json column, there is some additional object construction required for .files, .parents and maybe .pids.