eclipse-pass / pass-deposit-services

Deposit Services are responsible for the transfer of custodial content and metadata from end users to repositories.
Apache License 2.0
1 stars 4 forks source link

WIP: Support for Submission.metadata collection hints (master) #270

Closed emetsger closed 4 years ago

emetsger commented 4 years ago

# DO NOT MERGE Don't merge this PR until the PR against the maintenance branch is merged.

https://github.com/OA-PASS/deposit-services/pull/269 has been merged, and version 1.0.1-3.4 released from the 1.0.0-3.4-maint branch.

About

This PR provides support for selecting an appropriate collection for deposit based on hints supplied by the Submission, and a mapping of hints to collection URLs within Deposit Services. See the end of this PR description for an alternate approach that we could adopt in the future if we desire.

How it works

If Submission.metadata contains a hints object with a collection-tags array, Deposit Services will see if any of the supplied tags matches a configured tag in SWORD protocol binding configuration under the key collection-hints. If a hint supplied in the Submission matches a hint configured in Deposit Services, the configured URL will be used for the deposit. Otherwise the configured default-collection will be used for deposit.

Here is an example Submission.metadata carrying a hints object (based on the metadata-schemas PR).

If the Submission contains multiple hints, Deposit Services will attempt to look up configured collections for each hint, but will stop on the first match. Any remaining hints are not tested. There will only be at most one deposit.

Here is an example transport configuration for SWORD which configures the covid hint using the new collection-hints object. In this example, the Deposit Services environment would need to define DSPACE_COVID_HANDLE that resolves to a valid JScholarship collection.

"transport-config": {
      "auth-realms": [
        {
          "mech": "basic",
          "username": "${dspace.username}",
          "password": "${dspace.password}",
          "url": "${dspace.baseuri}/swordv2"
        }
      ],
      "protocol-binding": {
        "protocol": "SWORDv2",
        "username": "${dspace.username}",
        "password": "${dspace.password}",
        "server-fqdn": "${dspace.host}",
        "server-port": "${dspace.port}",
        "service-doc": "${dspace.baseuri}/swordv2/servicedocument",
        "default-collection": "${dspace.baseuri}/swordv2/collection/${dspace.collection.handle}",
        "on-behalf-of": null,
        "deposit-receipt": true,
        "user-agent": "pass-deposit/${deposit.services.version}",
        "collection-hints": {
          "covid": "${dspace.baseuri}/swordv2/collection/${dspace.covid.handle}"
        }
      }
    }

Configuration summary

In summary, the UI populates the Submission.metadata.hints.collection-tags array. Deposit Services configures a mapping between tags and collection URLs in the transport-config.protocol-binding.collection-hints object. If there is a match between Submission hints and Deposit Services hints, the first match is used to direct the deposit to the configured collection. Otherwise the default collection is used.

How it could work

Now, in the future it probably makes more sense for the SWORD endpoint to advertise the hints used to select collections. For example, the SWORD service document for JScholarship would be updated to carry the tags rather than embed them in the Deposit Services configuration. This puts the control for selecting the collection in the hand of the JScholarship administrators, and removes Deposit Services from hard-coding mappings between hints and collection urls. This approach would require some discussion with LAG before pursuing, but may be worth doing depending on how we see PASS interacting with J10P in the future (or other potential endpoints such as Dataverse).