american-art / npg

National Portrait Gallery
Creative Commons Zero v1.0 Universal
1 stars 6 forks source link

NPGConstituents:ConstituentID #40

Closed steads closed 7 years ago

steads commented 8 years ago

ConstituentID: This is a temporary system identifier used to tie records together between files and as such there is no need to map it.

VladimirAlexiev commented 8 years ago

ConstituentID=6 is missing, so I think this is a permanent ID from their database. It's not generated just for the export. In any case, it's used in the URL so it's permanent. I think it's better to keep it.

steads commented 8 years ago

Even if it is in the database it is not used as an identifier by real users it is just used by the system to tie things together. Being incorporated as an element of a longer real identifier does not make it special in some way. It will not be used as a target of integration by external agents.

VladimirAlexiev commented 8 years ago

How do you know there's no search by this ID in their system (TMS)? I think in the integrated system it makes sense to have a general search "by identifier", and it could search for this. If there's an ID, it seems prudent to include it.

steads commented 8 years ago

I am not talking about an "integrated system" but "integration" that is joining different data sources together and internal system IDs are not useful for this functionality.

edgartdata commented 8 years ago

Is there value in using the TMS constituentID to start the entity matching process? i.e. state that at the NPG the TMS ConstituentID for Benjamin West is 4 and NPG will match it with http://vocab.getty.edu/ulan/500026989

steads commented 8 years ago

Using ULAN IDs would be excellent but if NPG are going to do this why expose an internal system identifier to the world? Nobody else will be using it so why not just state that NPG has an instance of E21 Person {ULAN500026989} and then have the properties link to that instance. So for example P1 is identified by (identifies) E41 Appellation {Benjamin West}. This would not be such a good solution if NPG wanted to expose their reasoning for drawing that conclusion perhaps using CRMinf. In this case the TMS constituent ID would form a perfectly good identifier for the instance of E21 Person but I would still be dubious about the utility of mapping the TMS constituent ID to an instance of E42 Identifier as well. It is not wrong but I find it difficult to think of a use case in the world of integration.

caknoblock commented 8 years ago

Here is a real use case based on how we plan to create the linked data:

1) We map the data into CIDOC-CRM 2) We then use our automated linking tools to link the artists from NPG to the Getty ULAN 3) We then allow the museums to curate the links to ensure that all of the information has been accurately linked using our link curation tool 4) NPG updates their data about constituents and we remap the data to CiDOC-CRM

If we have used the TMS constituent ID to create the URIs, then all of our links between NPG and ULAN will be preserved. Otherwise, we would have to repeat the linking process.

On Jul 29, 2016, at 7:51 AM, steads notifications@github.com wrote:

Using ULAN IDs would be excellent but if NPG are going to do this why expose an internal system identifier to the world? Nobody else will be using it so why not just state that NPG has an instance of E21 Person {ULAN500026989} and then have the properties link to that instance. So for example P1 is identified by (identifies) E41 Appellation {Benjamin West}. This would not be such a good solution if NPG wanted to expose their reasoning for drawing that conclusion perhaps using CRMinf. In this case the TMS constituent ID would form a perfectly good identifier for the instance of E21 Person but I would still be dubious about the utility of mapping the TMS constituent ID to an instance of E42 Identifier as well. It is not wrong but I find it difficult to think of a use case in the world of integration.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/american-art/npg/issues/40#issuecomment-236247985, or mute the thread https://github.com/notifications/unsubscribe-auth/ABB-qTR1fGiA1vsSH35NA1zPth5s1PoOks5qaj2vgaJpZM4JEWga.

steads commented 8 years ago

I presume there is a step 1A where you actually transform your data following your mapping and some URI generation rules. Similarly I guess there will be a step 4A as well. This seems fine to me but still does not require a mapping of the internal database ID as it is just being used as a constituent part of your URIs.

workergnome commented 8 years ago

I think as long as the CIDOC-CRM data is co-existing alongside tools that use a unique ID (that is not the URL), it is important to maintain a way to correlate between those datasets.

I know that at my institution we have a set of tools that identify works of art by ID number, and if any new tool were to be developed using the CIDOC-CRM data, it would need to have access to that ID number to interoperate with the legacy systems.