galterlibrary / digital-repository

DigitalHub - Institutional Repository for Galter Health Sciences
https://digitalhub.northwestern.edu/
5 stars 1 forks source link

Update page count of records with page count 0 #1166

Closed fenekku closed 1 year ago

fenekku commented 1 year ago

Basically more records are affected by https://github.com/galterlibrary/digital-repository/issues/1088 .

Couple of things to check:

fenekku commented 1 year ago

Gretchen's answer to 1:

For the first question, if there's a way to change the page count in DH I think it will have to be on the back end. It's in the system-generated metadata, so I can't edit it. I can try just re-uploading the file to see if that fixes it?

i.e. Gretchen can't remediate the DH data from the web interface. We have to do it ourselves.

sharpattack commented 1 year ago

For the second question, what are we looking for that should have a page count? For records in NUCATS Grants, do we know that there should be a page count, or is there some other tell?

Do we have a list of records that should have a page count, and what the count should be?

fenekku commented 1 year ago

I think it would mean first finding all records that have a pdf file and whose page count is set to 0. Then find their actual page count (this could be manual or potentially automated given the number of those files) and set it in the export as per https://github.com/galterlibrary/digital-repository/commit/56ee5fb0b2cfc3367d88b3a61317804fc25da3db .

sharpattack commented 1 year ago

😬 It looks like we never addressed the main issue in the first place. Just accounted for a couple of records.

@fenekku there is no way to edit the page count on Prism/InvenioRDM? If I were to create a new record and upload a pdf file, how does page count get added? I don't see how to from the front-end.

In digitalhub, for one sample file that shows a 0 page count, I still get the same 0 page count after re-uploading it as a new record. It should show 10. Either this is a bug, or the file does not play nice with digitalhub's file processing.

This seems to be common for PDFs from the NUCATS Grants collection. I have not found an example outside this collection, but I can work on a script to pull all records that have a PDF file with a 0 page count. I will pass on the results of that to @gneidhardt, then we will need to get the expected page count for each.

fenekku commented 1 year ago

On Prism there is indeed no frontend to edit sizes right now, but via the API it is possible. In the export (and API), the field record.metadata.sizes is an array of strings containing the content displayed in the sizes section.