american-art / npg

National Portrait Gallery
Creative Commons Zero v1.0 Universal
1 stars 6 forks source link

NPGConstituents: Name fields #39

Closed steads closed 7 years ago

steads commented 8 years ago

The use of E82 Actor Appellation is incorrect. These are instances of E41 Appellation applied to instances of E39 Actor NOT instances of E82 Actor Appellation as they are not characteristically of a form that indicates that they are being applied to an instance of E39 Actor. The use of P106 is composed of (forms part of) to decompose the appellations into constituent parts is correct. However many of the constituent parts of names are not Appellations but descriptions of group membership. Groups need to be separated from individuals and then the true meaning of component parts and the relationships to each other and the instance of E39 Actor can be assessed. Indirect naming; daughters of; man of x family etc do not show the genuine nature of the real world objects so named. => More sophisticated rules and mapping preferable. DisplayName, NameTitle, FirstName, MiddleName, LastName, Suffix, AlphaSort and Institution are probably all appellations and can be properly mapped once the mixture of E21 Person and E74 Groups is resolved and the various membership relationships have been become clearer. There could also be P1 links between the constituent appellation parts and E39 or E21. This would make some kinds of integration easier.

VladimirAlexiev commented 8 years ago
DisplayName FirstName MiddleName LastName NOTE
Ted Bair and Harvey Filister Ted Bair and Harvey NULL Filister Ted Bair is a full name, not first name
Tench & John Frazer Tench & John Frazer NULL Tench & John are two people
Teodoro de Croix Teodoro de Croix de Croix is the last name
Teresa J And Grace Fitch Teresa J And Grace NULL Fitch Teresa J and Grace are 2 people
Marianne Von Geyer Belcher Marianne Von Geyer Belcher She has two last names: Von Geyer (maiden) & Belcher (married). http://www.biographi.ca/en/bio/belcher_edward_10E.html
Wilhelm von Moll Berczy Wilhelm von Moll Berczy von Moll is a former last name, see https://en.wikipedia.org/wiki/William_Berczy
Gebhard Leberecht Von Blücher Gebhard Leberecht Von Blucher Von Blücher is the last name
<http://americanartcollaborative.org/npg/person-institution/0/appellation> a crm:E82_Actor_Appellation ;
    rdfs:label "Unidentified Artist" ;
    crm:P106_is_composed_of <http://americanartcollaborative.org/npg/person-institution/0/appellation/lastname> .

<http://americanartcollaborative.org/npg/person-institution/0/appellation/lastname> a crm:E82_Actor_Appellation ;
    rdfs:label "Unidentified Artist" ;
    crm:P2_has_type <http://americanartcollaborative.org/npg/thesauri/nametype/lastname> .
steads commented 8 years ago

Instances of E82 Actor Appellation must, by there very nature, be identifiable as appellations of actors rather than be appellations which are used to identify actors. Things like ULAN identifiers, social security numbers and company numbers are examples of E82.

You have correctly identified even more of the many problems in the splits!

There is a much bigger problem with the "Unidentified ...." in that the same "person" is used to stand in for all unidentified people!!

I agree that it is madness to create "last names" out of things which do not have such a concept like "unidentified..." and company names.

VladimirAlexiev commented 8 years ago

Hold on, "John Smith" is not an actor appellation??

Agree about Unidenfitied, maybe post a separate issue for it?

steads commented 8 years ago

No John Smith is not an E82 Actor Appellation

edgartdata commented 8 years ago

Can we come back to E82 versus E41? Is that a deal breaker?

Also I am not sure that modeling appellation parts (first name, middle name, last name) separately is useful? Unless someone researches naming trends I guess? Wouldn't it be more useful to have the literal/label with the actor's full name?

edgartdata commented 8 years ago

Can we work on a solution for unidentified actors? It will be important because otherwise we will leave out many actor entities out of the AAC browsing app.

steads commented 8 years ago

Each Unidentified Actor must have there own separate unique identity which all share the same instance of E41 Appellation {Unidentified}.

edgartdata commented 8 years ago

So E39 or E21 individual Unidentified Actor -> P1_is_identified_by -> E41 Appellation {Unidentified} where E39 or E21 individual Unidentified Actor get unique URIs?

For example one such actor may be http://collection.britishart.yale.edu/id/page/person-institution/185 and another one may be http://collection.britishart.yale.edu/id/page/person-institution/3887 (made up URIs).

The problem with that, at least at the YCBA, is that all individual unidentified actors are lumped all together into 1 TMS authority record (does not make sense in terms of semantics but is practical for managing traditional data pre RDF - otherwise our db would have many similar similar Unidentified Actor authority records). So should we put a process on the URIs for these Unidentified Actors that whenever they are linked to a different object they get a different URI? There must be a better solution!

steads commented 8 years ago

Can we come back to E82 versus E41? Is that a deal breaker?

Nope just use E41 it is the superclass for E82 anyway

VladimirAlexiev commented 8 years ago

Hi @steads! I still don't understand why 500010654 is more "identifiable as an appellation of an actor" (to me it looks like a mere number) than "Michelangelo Buonarroti" is.

@edgartdata: yes, strict modeling would require making unique "Unidentified" URLs per object. I think we need a separate issue for that?

steads commented 8 years ago

@VladimirAlexiev cannot help with your understanding!! Just telling you what the CRM definition says.

workergnome commented 8 years ago

@steads, I'm trying to understand where in the CRM definition that is. If I look at http://new.cidoc-crm.org/Entity/e82-actor-appellation/version-6.2.1, I see a list of examples which include things like "John Doe" or "International Council of Museums", but nothing that looks like a raw social security number (though a string including a social security number is included).

Is it possible that I'm looking in the wrong place?

si-npg commented 8 years ago

All of Alexiev's examples of "mistakes" in are constituents that should not be included in our data at all. It will help to get NPG's second export uploaded so that no time is wasted analyzing data that won't be included. I will contact Craig, who has the cleaner data in Dropbox.

VladimirAlexiev commented 8 years ago

Quickly checked NPG_LOD.zip. 12.5 records.

The parsing is much improved, eg

Some problems: