american-art / npg

National Portrait Gallery
Creative Commons Zero v1.0 Universal
1 stars 6 forks source link

URLs for Unidentified Actors #50

Closed edgartdata closed 7 years ago

edgartdata commented 8 years ago

How do we create unique URLs for Unidentified Actors? Although they all share the same appellation, and probably the same authority record in our databases, we understand Unidentified Actors to be distinct entities. Therefore they should get unique URLs, just like any other E39 Actors. How do they get different URLs in the mapping process?

VladimirAlexiev commented 8 years ago

Unique URLs per use in every object. Ok, you may not like this below.

Currently it says:

@base <http://americanartcollaborative.org/npg/>.
<person-institution/0> a crm:E39_Actor. # crm:P131_is_identified_by... Unidentified Artist
<person-institution/1> a crm:E39_Actor. # crm:P131_is_identified_by... Unidentified Man
<person-institution/2> a crm:E39_Actor. # crm:P131_is_identified_by... Unidentified Woman
<person-institution/2> a crm:E39_Actor. # crm:P131_is_identified_by... Unidentified Child

# and when used in an object, eg:
<object/1> crm:P62_depicts <person-institution/1>, <person-institution/2>.
<object/1/production> crm:P14_carried_out_by <person-institution/0>.

This should become:

<object/1> crm:P62_depicts <object/1/depicts-unidentified/1>, <object/1/depicts-unidentified/2>.
<object/1/production> crm:P14_carried_out_by <object/1/producedBy-unidentified/0>.

<object/1/producedBy-unidentified/0> a crm:E39_Actor.  # crm:P131_is_identified_by... Unidentified Artist
<object/1/depicts-unidentified/1>    a crm:E21_Person. # crm:P131_is_identified_by... Unidentified Man
<object/1/depicts-unidentified/2>    a crm:E21_Person. # crm:P131_is_identified_by... Unidentified Woman
<object/1/depicts-unidentified/3>    a crm:E21_Person. # crm:P131_is_identified_by... Unidentified Child

And if you want to search by unidentified persons, you could add:

<http://americanartcollaborative.org/thesaurus/gender/male>
  a crm:E74_Group. # crm:P131_is_identified_by "Male people"

<http://americanartcollaborative.org/thesaurus/gender/female>
  a crm:E74_Group. # crm:P131_is_identified_by "Female people"

<http://americanartcollaborative.org/thesaurus/agent/unidentified>
  a crm:E74_Group. # crm:P131_is_identified_by "Unidentified people"

<http://americanartcollaborative.org/thesaurus/agent/unidentified/men>
  a crm:E74_Group; # crm:P131_is_identified_by "Unidentified men"
  crm:P107i_is_current_or_former_member_of
    <http://americanartcollaborative.org/thesaurus/gender/male>,
    <http://americanartcollaborative.org/thesaurus/agent/unidentified>.

<http://americanartcollaborative.org/thesaurus/agent/unidentified/women>
  a crm:E74_Group; # crm:P131_is_identified_by "Unidentified women"
  crm:P107i_is_current_or_former_member_of
    <http://americanartcollaborative.org/thesaurus/gender/female>,
    <http://americanartcollaborative.org/thesaurus/agent/unidentified>.

<http://americanartcollaborative.org/thesaurus/agent/unidentified/children>
  a crm:E74_Group; # crm:P131_is_identified_by "Unidentified children"
  crm:P107i_is_current_or_former_member_of
    <http://americanartcollaborative.org/thesaurus/agent/unidentified>.

And then for the per-object instances:

<object/1/producedBy-unidentified/0>
  crm:P107i_is_current_or_former_member_of <http://americanartcollaborative.org/thesaurus/agent/unidentified>.

<object/1/depicts-unidentified/1>
  crm:P107i_is_current_or_former_member_of <http://americanartcollaborative.org/thesaurus/agent/unidentified/men>.

<object/1/depicts-unidentified/2>
  crm:P107i_is_current_or_former_member_of <http://americanartcollaborative.org/thesaurus/agent/unidentified/women>.

<object/1/depicts-unidentified/3>
  crm:P107i_is_current_or_former_member_of <http://americanartcollaborative.org/thesaurus/agent/unidentified>.

Why do I suspect @azaroth42 would be disgusted by such complexity?

azaroth42 commented 8 years ago

I agree that there should be a new instance of an "unidentified person" for every unidentified person :) If the person becomes identified later, or more information becomes known about them, if not their actual identity, then there's a resource to associate that information with.

The group construct I do think is overly complex. Just use schema:gender (or whatever, I'm not religious about it) from the instance to give the known information about the person. Otherwise you'd need to move membership from the unidentified-women group to the unidentified-women-born-in-august-in-western-europe-with-first-name-maria group.

I wouldn't put "unidentified" in the URL slug -- it's not very future proof in case the person does become identified in the future.

edgartdata commented 8 years ago

So what will happen for museums that only have one "unidentified person" record, whether that person is American, British, male, female, a person or a group, an artist or a donor? How will creating new instances of a new unidentified person every time this unique db record is used work? What I am worried about are the cases when the museums cannot say whether it's an unidentified woman or man,... Isn't there a more agnostic way of creating new instances/URIs?

azaroth42 commented 8 years ago

I don't follow the problem, I'm afraid? Every time there's a reference to an unidentified person, generate a URI that identifies the person. You might not know anything about the person, but you know they must have existed 😄

e.g. for an artwork with a completely unknown artist:

<museum.org/lod/art/1> a Man_Made_Object ;
  wasProducedBy <museum.org/lod/event/production/1> .
<museum.org/lod/event/production/1> a Production ;
  wasCarriedOutBy <museum.org/lod/person/1> .
<museum.org/lod/person/1> a Actor ;
  rdfs:label "Unknown actor" .

[It might not have been a "Person", it could recently have been a machine, or could have been a group of people, hence Actor]

edgartdata commented 8 years ago

So the <museum.org/lod/person/1> will get randomly assigned sequential numbers every time my one TMS unidentified actor authority record is used in a mapping? Does this special treatment for minting of URIs for unidentified actors need to be documented in a special way in the mapping process here? (It strikes me that my question is so basic that it's why it makes no sense to you Rob!).

azaroth42 commented 8 years ago

I would construct the URI from other identifiers to ensure consistency across exports. If it's the artist role for object 789, the URI might be: http://museum.org/lod/person/artist_789 It would not be good to have them randomly or sequentially assigned, as they'd change every export. If we agree that having a distinct resource is the way to go, then we should definitely document the mapping :)

VladimirAlexiev commented 8 years ago

If the person becomes identified later then there's a resource to associate that information with.

I don't agree completely. If now it says

<object/1/depicts-unidentified/1> "Unidentified man" 

and later someone identifies him as

<constituent/5> "John Smith"

What would we do? State sameAs?

<object/1/depicts-unidentified/1> owl:sameAs <constituent/5>

There are problems with this:

Or would we let John Smith have 2 unrelated URLs? or rewrite the per-object URL with the global one? These problems occur no matter whether we have "unidentified" in the URL or not.

schema vs CRM

This same question recurs many times, so we need to decide it separately. It is to some extent "simplicity vs complexity" but on the other hand CRM provides a lot of fundamentally sound modeling constructs that allow you to model various historic/art situations in a principled way.

And we also need a way to mark those constituents as "Unidentified" so this can be searched for.

museums that only have one "unidentified person" record

No problem, exactly the same way:

<object/1/depicts-unidentified/1>    a crm:E21_Person. # crm:P131_is_identified_by... "Unidentified"
<object/1/depicts-unidentified/1>
  crm:P107i_is_current_or_former_member_of <http://americanartcollaborative.org/thesaurus/agent/unidentified>.

will get randomly assigned sequential numbers every time my one TMS unidentified actor authority record is used in a mapping? I would construct the URI from other identifiers to ensure consistency across exports

Yes, it's better to have deterministic URLs, and my proposal uses deterministic URLs.

Isn't there a more agnostic way of creating new instances/URIs?

My proposal combines the object_id and constituent_id in the URL. If a museum has 10 different unidentified (as the first example), it would use these 10 constituent_ids. If it has only one unidentified, then it will use just 1 constituent_id (as in the last example above).

This relies on an assumption that a painting will carry only one statement of each "created by unidentified", "depicts unidentified" or "depicts unidentified Man". Even if there are 10 unidentified men in the painting, I think it's fair to assume that "depicts unidentified Man" will be recorded only once. Only if that's not true, then we'd have to add a sequential counter to the URL.

Does this special treatment for minting of URIs for unidentified actors need to be documented

Yes: my experience is that in practical RDF mapping, that does matter.

azaroth42 commented 8 years ago

You would update the resource by removing the now incorrect label. The same way that you'd update the resource if you discovered further information about the actor, such as gender or nationality. It might go from being "Unidentified Person" to "Unidentified Male" to "Unidentified Bavarian Male" to then being John Smith.

I'm not too concerned about the URI, they're supposed to be opaque after all, but in this final state the unidentified-ness is no longer true, and that's the desirable state to progress towards. As it would be same-as the identified person, hopefully the reference would no longer be from object to unidentified-person but to the ULAN or other canonical source directly. We would just maintain the URI for persistence of links to it. I would not layer it under the object for concern that if the object is deaccessioned from the museum's collection, then the information about the person would also be removed, and may have valuable links to it (this unidentified female appears in two paintings ... which might then aid future researchers in identifying her).

Regarding higher complexity, I disagree. It's simpler to have one resource per "thing" (and indeed a best practice by any definition of linked data) rather than have one resource that identifies many things . The same "unidentified person" did not create all of the objects that the identity would be associated with, but that's the implication of what the Linked Data would say.

Searching: Sure. We can have some class (be that an rdfs:Class or an E55 Type-not-actually-linked-data-class-actually-maybe-a-concept) of currently unidentified actors. If you want to search for unidentified Germans, you wouldn't have a class for that subset of unidentified actors, you'd have unidentified actors that have a nationality of German, thereby reusing existing predicates and instances, making the overall system more coherent.

If there are 10 different unidentified people associated (in any way) with an object, then you would need to have 10 different resources. Otherwise how does someone assert that the 10th person in The Night Watch is Lord Jehan Smythe of Springville, and allow someone else to assert that the 9th is Lady Janice Dowe? In a family portrait, where the sitters are unknown, you could make familial assertions without having explicit identities to associate with each. Without the nodes to associate the information with, systems would need to always replace rather than update for fear of making assertions about multiple agents.

VladimirAlexiev commented 8 years ago

If the painting depicts a known person, we use the institution's global constituent URL. If later it is discovered that's another person, the institution would replace the info, right? We're not "reserving" a local URL for every sitter in a portrait, and then sameAs'ing it to something global, or adding info to that local URL. As for familial relations between unknown people, it's a nice theory, but there is simply no such data to deal with.

workergnome commented 8 years ago

If you're looking for use-cases of real data involving familial relationships between unknown people, I've got them in spades with the CMOA provenance data—lots of "John Smith, 1775; by descent. Christies, London, 1950." records, where you know it was passed down to an unknown family member.

Outside the scope of the AAC, but a use case I run into all the time.

azaroth42 commented 8 years ago

If the painting depicts three unknown people, but there's only one triple of :object depicts :unknownPerson, how do you count the number of people depicted?

Or the equivalent for artists, the most prolific artist in every collection is :unknownPerson.

VladimirAlexiev commented 8 years ago

Simple answer: you can't count it precisely. Rob, is there an object record that references one of the Unknown constituents more than once? I bet you $5 there isn't.

azaroth42 commented 8 years ago

I bet there isn't either, but I would also bet that there aren't name appellations and measurement events and ... :) With my suggestion, you can count it precisely if the information is available, which it might be in the future even if it isn't now.

edgartdata commented 8 years ago

Actually at the YCBA we have about 2,000 works by unknown artists after unknown artists. I think it is not that infrequent especially for collections of reproductive prints: http://collections.britishart.yale.edu/vufind/Record/3641507

azaroth42 commented 8 years ago

I guess we both owe Emmanuelle $5 :)

So without unique unknown people, this would be the nonsensical statement that there is an unknown artist that follows in their own style.

edgartdata commented 8 years ago

With unknown actors, it is highly probable that they were distinct artists, however, in the 18th century, and into the 19th century, it is not unusual for artists to create prints after their own paintings. It was a way to let the world know of their new work, and it could provide a steady revenue in between big commissions for paintings. And of course there is Turner's Liber Studiorum where his prints are after his own watercolors and drawings: http://www.tate.org.uk/art/research-publications/jmw-turner/liber-studiorum-drawings-and-related-works-r1131702#synopsis So both cases happen! I hope there is a good coffee shop near ISI...

VladimirAlexiev commented 8 years ago

Rob, with the URLs I proposed (which are unique per object and per role) such smushing of unknowns won't happen:

<object/1> crm:P108i_was_produced_by <object/1/production>.
<object/1/production>
  crm:P14_carried_out_by <object/1/producedBy-unidentified/0>;
  crm:P15_was_influenced_by <object/1/after-unidentified/0>;

<object/1/producedBy-unidentified/0> a crm:E39_Actor;
  crm:P107i_is_current_or_former_member_of <http://americanartcollaborative.org/thesaurus/agent/unidentified>.
<object/1/after-unidentified/0> a crm:E39_Actor;
  crm:P107i_is_current_or_former_member_of <http://americanartcollaborative.org/thesaurus/agent/unidentified>.

Emmanuelle, if we don't know two people are the same, we shouldn't use the same URL. If later someone learns they are the same, they can say sameAs.

azaroth42 commented 8 years ago

Sure! I don't particularly care about the URL pattern, just that the URLs should all be unique for unidentified agents :)