emory-libraries / dlp-selfdeposit

0 stars 0 forks source link

Implement custom ID scheme for Self Deposit #198

Closed eporter23 closed 4 months ago

eporter23 commented 7 months ago

For our Samvera repository objects, we want to implement a locally defined unique identifier scheme instead of using UUIDs or ark IDs. Hyrax v5.x has dropped the previous feature of custom Noid IDs. In order to enable this feature now would involve writing a new feature to Valkyrie, which is a layer removed from Hyrax. After a planning meeting on 06/03/2024, we opted to assign Noid IDs to Collections, Publications, and FileSets via the attribute alternate_ids. This field already exists across all of the aforementioned Models, as well as FileMetadata. The new goal of this issue is to have this attribute automatically assigned a NOID ID whenever an object is created by the system (transactions/steps). Example of an existing override that can be used to assign an ID to FileSets: https://github.com/emory-libraries/dlp-selfdeposit/blob/main/config/initializers/hyrax_work_uploads_handler_override.rb

This scheme should: Be unique and not conflict with conventions used in ETDs or Curate Be compatible with pairtree for scalability Be usable for persistent URLs (separate ticket to be written)

Proposed convention: Use the same alphanumeric pattern used for Curate, but add an "-emory" suffix instead of "-cor".

Pattern to generate: 10 digit alphanumeric sequence: 123??????? followed by a text suffix "-emory." 123 are random, numeric values ??????? are random alphabetic or numeric values

Example of desired output for Self Deposit: 7784j0zqdg-emory

Prior work references: Epic for Curate ID scheme Example of an ETD minted ID: qj72p877j (9 characters)

eporter23 commented 5 months ago

There is a blocker related to the local Docker environment, but code wise moving forward.

bwatson78 commented 5 months ago

@alexBLR I've rewritten the requirements.

abelemlih commented 4 months ago

I am currently seeing the following error in my local environment as well as Test:

Screenshot 2024-06-19 at 5.08.46 PM.png

It is tied to changes in this ticket and this PR specifically: https://github.com/emory-libraries/dlp-selfdeposit/pull/349/files. We may need to revert some of these changes to get publication creation to work properly.

bwatson78 commented 4 months ago

@abelemlih I'd like the chance to fix this in place. Would it be ok if I work on this today and if a solution isn't foundby the end of day, we'll revert it then?

bwatson78 commented 4 months ago

Proof of Publication #alternate_ids persistence: https://fedora-depo-arch.libraries.emory.edu/fcrepo/rest/production/68121a4f-4c3c-4a04-9066-5a3072c99004

bwatson78 commented 4 months ago

Proof of FileSet #alternate_ids persistence: https://fedora-depo-arch.libraries.emory.edu/fcrepo/rest/production/ef2877bc-4a69-46d4-9a14-492d77909ac4

eporter23 commented 4 months ago

I am seeing alternate_ids getting generated and persisted for works and FileSets! I am seeing however that the pattern mapped out in the ticket description doesn't seem to be in place. Examples: 1r66j112r-emory and 8k71nh08w-emory Following the Curate ID scheme, the first 3 characters should be numbers only.

bwatson78 commented 4 months ago

@eporter23 I will make a PR that implements that change.

eporter23 commented 4 months ago

@bwatson78 and @alexBLR one thing I am also seeing when the alternate_ids are set is that it essentially creates a new reference Fedora resource each time. I'm not sure if there is anything we can do about it, but I wanted to note this because that will mean a lot of additional objects in Fedora! I will also note this for the next Fedora 6 WG meeting.

eporter23 commented 4 months ago

I tested again and got this error: Screenshot 2024-06-25 at 9.51.16 AM.png

bwatson78 commented 4 months ago

@eporter23 I'll work on this now that I've confirmed that Shibboleth needs Kaeln's help to further.

bwatson78 commented 4 months ago

@eporter23 This was happening because we didn't institute a specific way to retain the known minted IDs once they were created. I have followed convention with Curate and have called out that we're using the database-backed models.

bwatson78 commented 4 months ago

We will probably have to clear out all of the objects and start fresh to get this going right.

eporter23 commented 4 months ago

So far in arch after the reset things are looking good! Screenshot 2024-06-26 at 11.43.30 AM.png

bwatson78 commented 4 months ago

The wipe and deployment of both Test and Arch are complete: PR made: https://github.com/emory-libraries/dlp-selfdeposit/pull/366

eporter23 commented 4 months ago

Thanks so much @alexBLR and @bwatson78 for all the work on this: I think we are finally done!

eporter23 commented 4 months ago

A note for posterity: we are persisting these as http://id.loc.gov/vocabulary/identifiers/local#emory