caltechlibrary / irdmtools

A Go and Python package for working with InvenioRDM repositories.
https://caltechlibrary.github.io/irdmtools
Other
1 stars 1 forks source link

rdm2eprint, need to create primary_object structure in the JSON EPrint representation #66

Closed rsdoiel closed 9 months ago

rsdoiel commented 10 months ago

CL.js depends on a "primary_object" structure being populated. This should be doable in rdm2eprint since it requires less information than the .files attribute which I have avoided mapping.

tmorrell commented 10 months ago

You'll want the file name in the "files.default_preview" field, if set. Otherwise grab the first in "files.entries" Then put it in a url like https://authors.library.caltech.edu/records/b38ev-3qs07/files/12864_2023_Article_9754.pdf

rsdoiel commented 9 months ago

I’m working on the primary object. The default preview info is in the rdm_records_metadata table as

json->'files'->'default_preview'

I can join that with files_files using the value of default_preview as the key and then join that with files_object to get additional metadata (e.g. checksum, size, storage uri). Do you want the additional metadata? If you don’t that saves two extra joins and I’ll skip it.

I’m still working on the use case where there are files but no default preview but there are files.

tmorrell commented 9 months ago

We don't need any additional metadata. Just the URL should be fine. If folks want the details they can go to RDM

rsdoiel commented 9 months ago

In case you have to query Postgres and list files for a given RDM record, the query below is helpful (I didn't join with files_object to get the extra metadata).

WITH t AS (
    SELECT id AS record_id,
           json->'files'->'default_preview' AS default_preview,
           json->>'id' AS rdmid
    FROM rdm_records_metadata
) 
SELECT rdmid, default_preview,
rdm_records_files.*
FROM t 
JOIN rdm_records_files
    ON (t.record_id = rdm_records_files.record_id)
WHERE rdmid = $1
;

The $1 is the short RDM id parameter (not the internal UUID you never see in the UI).

rsdoiel commented 9 months ago

The rdm files.entries object is not an array so per Tom I will let Postgres determine the order to list the entries and pick the first one Postgres returns to become the "primary_object" for our EPrints struct. The rest will go in the related objects list.

rsdoiel commented 9 months ago

I have primary objects and related objects added to the EPrints structure from both the RDM JSON API and from direct access to Postgres. For the SQL accessed version I have sorted the entries by "key" (aka filename) so that successive rdm2eprints invocations are consistent for a given record.

This is included in the upcoming v0.0.60 release of irdmtools.

rsdoiel commented 9 months ago

Sort order of entries is random even when I impose a sort order in my SQL when picking the default preview value if none provided. Not going to pursue fixing that.