PACKED-vzw / resolver

The Resolver application is a tool for creating, managing, and using persistent URIs.
Other
5 stars 3 forks source link

Object ID collisions when dealing with data from multiple institutions? #6

Open netsensei opened 9 years ago

netsensei commented 9 years ago

Problem

Our resolver contains persistent URI's pointing to objects coming from multiple institutions. Each institution has its own format/scheme of identifying objects via a unique ID.

The ID is unique within the domain of that institution. However, if subsets originating from different domains using similar identification schemes (ie. incremental numbering) then a collision between ID's is possible.

Example: Work A in institution Z is identified as 001 while Work B in institution Y is also identified as 001.

How to reproduce

  1. Create a CSV with a 2 object entries
  2. Make sure each object has the same object identifier
  3. Import the CSV in the resolver

Observed behaviour Notice how only 1 entry is created.

Expected behaviour A correct import with 2 different entries both containing active persistent URI's (data & representation).

Resolution Introduce a "domain" or "namespace" property in the datamodel. This could be used to encapsulate subsets using similar identification schemes.

I'm proposing the generic "domain" label instead of "institution" to make this property as flexible as possible. This way, it remains easy to create subsets within an institution which use the same identification schemes (ie subcollections with use the same numbering format)

Impact The persistent URI itself will need to be modified to include the "namespace" or "domain" property.

nvgeele commented 9 years ago

At the inception of the project (before I joined) the decision was made to not include support for namespaces/different domains in the resolver. I will close this issue until this decision is reversed, which I do not believe will happen after speaking with Bert.

bert-packed commented 9 years ago

Some more information about why we did not include 'namespaces': We considered it the responsibility of each institution to ensure ID's are unique in their local domain. This is to be achieved in the collection management system.

The case of VKC is different indeed, because you basically maintain a knowledge database about works in multiple collection. However, I believe we should stick to the same principle: VKC is responsible for creating ID's that are unique in their own domain. This should not be solved by the resolver.

That said: What can be a practical solution? It makes sense to reuse the Work ID's from the collection institutions. Why not hard code the prefix in the ID: http://vlaamsekunstcollectie.be/collection/work/data/1234 would become http:/vlaamsekunstcollectie.be/collection/work/data/mskgent-1234.

The other option where we add a specific property to the resolver datamodel to create: http:/vlaamsekunstcollectie.be/collection/work/data/mskgent/1234 has considerable implications for the URI routing in the resolver, moreover because the use of the collectie node is optional. It also enables using the namespace logic to avoid te challenge of creating unique ID's for works in one collection, which is actually somethin we want to advance. So that's why we would like to avoid this solution.