TEIC / TEI

The Text Encoding Initiative Guidelines
https://www.tei-c.org
Other
270 stars 88 forks source link

Add an organism element #2295

Closed ebeshero closed 1 year ago

ebeshero commented 2 years ago

In working on #2189 and #2190, we find a need for an <organism> element as distinct from a <person> element to assist in encoding plant, animal, and other life forms that may have biological sex. This is a need in many projects which have otherwise been customizing <person> and <list> to deal with the lack of expressivity in our Guidelines on this matter.

sydb commented 2 years ago

I am not expressing a significant opinion one way or the other on <organism> itself. But I am about to make a “slippery slope” argument against. The reason this is not truly expressing a significant opinion is that, as many readers of this conversation will know, I do not think slippery slope arguments hold much water. But they ain’t nuthin’, neither.


But where does it end? TEI has various elements for describing various named entities (people, places, organizations, and names themselves) because, back in 2006 or so, we perceived there was a need. There was (at least some believed) sufficient demand for a mechanism for a) providing disambiguation of named entities, and b) providing a place for project-specific definitions of, or at least additional information about, entities named in encoded humanities documents. Furthermore (since long before 2006) TEI provides a generic mechanism for providing these (see the 3rd example of 3.6.3, or this snippet of a tutorial). By “sufficient demand” I mean that enough humanities scholars were interested in recording this sort of information that it was appropriate to come up with a standard, so that their encodings would be more interchangeable. I submit history has taught us that those who predicted sufficient demand for at least people and places, and, to a lesser extent, organizations were correct. (I am not as confident about names, because <nym> is pretty much only used in specialized fields with which I no longer have significant interaction. @gabrielbodard, @emylonas, @hcayless would know better than I. But it hardly matters. Even if <listNym> is not used, 3 out of 4 is pretty good.)

Now if @ebeshero et al. say there are projects that could make use of <organism>, I am prepared to believe them and be in favor. But it is worth keeping in mind that unless there is more than one project which will use the same sort of structure, a standardized encoding is not particularly helpful. So <organism> addresses the needs of those scholars who want to encode information about flora, fauna, and aliens (and, I daresay, organisms that do not have sex, like bacteria, mold, fungi, and of course everybody’s favorite, virus variants). But what about those who want to disambiguate and detail movies? Planets? Rocks? Foods (in particular cooking ingredients)? Bones? Arteries? Tanks? Medications or poisons? Paper sizes? Monetary systems? Musical tracks? (A Phish group once embarked on the task of encoding the play list of every Phish concert, ever. I tried to talk them into using SGML, if not TEI. None other than Yuri Rubinsky himself told me that if I was successful, he would give them a free copy of Author/Editor. Perhaps if TEI had a dedicated play list element, instead of just <list> …)

ebeshero commented 2 years ago

@sydb Indeed, we are aware of projects and of cultural heritage documents (such as medieval bestiaries, Erasmus Darwin's The Loves of Plants, H. G. Wells' The War of the Worlds, among many others), that take the world of non-humanoid entities very seriously indeed, even in a classificatory way. The letters of Mary Russell Mitford encoded in The Digital Mitford project have long been relegated to dereferencing our markup of geraniums, anemones, and other English garden plants as simply <item> in a generic <list> in our extensive prosopography, and pet animals using <person> simply because they are given names (so we arbitrarily adopt them into the world of persons because we're encoding personographies). We are aware of having heard a TEI conference paper (I believe the one from Karen Bourrier and Kailey Fukushima from the Digital Dinah Craik project) on the special problem of encoding representations of animals in the TEI: we have no tag set for them; we are decidedly concentrated on persons. And yet we text scholars care about documents that explore the non-human world.

Our group working on the encoding of <sex> and <gender> came to a realization today that the word organism captured something vital and simple: a form of life that need not delineate itself in terms of person-hood or humanity, but could encompass animal, plant, fungal, alien, any life form. And given the association of biological sex with life forms and gender with persons and human culture, we recognized the significance of that word organism and how widely it might carry. We believe we can keep the content model of <organism> fairly simple, working with existing prosopography element structures and can adapt from existing prosopography "hacks" we've had to impose to make more expressive encoding.

ebeshero commented 2 years ago

As for monetary systems, @sydb , you certainly remember the work we did on <unitDecl>, but that is beside the point. We call here for an <organism> element because our existing guidelines give us only <person> and that is not good enough. I should remind us of the <object> element as well: https://tei-c.org/release/doc/tei-p5-doc/en/html/ND.html#NDOBJ to address your "slippery slope" question: We have a means of encoding much of the non-living stuff on your list already.

duncdrum commented 2 years ago

Let’s not forget that in historical documents both mythical and non-mythical beast appear. object and Name seem equally suboptimal, compared to the proposed organism .

jamescummings commented 2 years ago

To address the questions of non-human beasts, aliens, etc. I remember asking this question during the P5 updating of named entity elements and being told very certainly that mythological creatures, deities, beasts, aliens, sentient robots, etc. all should be described using 'person'.

If you are going to tell me that Ratty and Moley from Wind In The Willows are not people, we'll have a big argument indeed.

If they aren't a person, but a fern, or geranium, then it seems obvious to me that these are <object>s.

duncdrum commented 2 years ago

I'd say the question is if the document at hand treats these entities as persons or not. Personlike and objectlike treatment of animals can coexisting within the same text. My political views about animal personhood aside, I think a third category between person, and stuff makes perfect sense.

sydb commented 2 years ago

I am with you, @jamescummings, that many an alien, the Sphinx, Winnie-the-Pooh, Peter the Rabbit, and even Bambi can all be encoded as <person>. But thinking of Lassie or Buck (from Call of thte Wild) as a person is more of a stretch; and thinking of Captain Cook (from Mr. Popper’s Penguins) as a person seems outright problematic (if I remember the story correctly). In any case, thinking of a plant as an <object> is a bit weird, especially when you want to assign sex. And while Firenze is obviously a <person>, despite being half horse, what about Buckbeak? You would certainly offend the noble hippogriff if you suggested encoding him as an <object>, but I do not think <person> is appropriate, either.

HelenaSabel commented 2 years ago

I agree with @duncdrum and @sydb : I think that the issue arises when we require to encode animals and plants without personification.

For example, @ebeshero mentions bestiaries and I think is a good example of descriptions of animals that present several traits that an encoder would like to encode by means of <sex> and <trait> without having to use <person> (because there is not always a personification in the description of the animals, although it of course depends on the bestiary). The traits are in most cases physiological and even when they are not (e.g. does described as “timid”) is hard to justify a personification.

lb42 commented 2 years ago

Two, no, three questions:

  1. Is the proposal also to create an "organismName" element for the special case of a name which refers to an organism"? (I spy endless confusion with "orgName" in the distance)
  2. Are "person" "place" and "object" to be considered special cases of "organism" (special because their content models are slightly different), or ontologically distinct?
  3. Now that we've all stopped using the word in the SGML sense, how about calling the generic thingy which holds metadata for some named entity an entity ?
ebeshero commented 2 years ago

@lb42 for the case of referencing the name of an organism, I agree that <organismName> is too much and confusing! But I think we can handle this with just <name type="organism"> and work with existing elements only, and that may answer your second question about ontological relations.

I am not sure we want the more general <entity> instead of <organism>, since entity need not have a biological existence. But perhaps entity as in thing that can be named is a more general-purpose construction for another ticket.

lb42 commented 2 years ago

I've no wish to reignite an old argument (especially one I lost), BUT it occurs to me that animals are frequently given names indistinguishable from those of people (e.g. "Empress of Blandings", "Bustopher Jones", any racehorse you fancy...). As indeed are places, particularly pubs ("Empress of Blandings" (again), "The Rupert Brooke", "General Elliott"). Any simple minded typology for <name> risks falling apart when it confounds the type of name it is (a personal name, a place name, a nickname etc.) with the type of object it's referencing (humanoid, animal, building) : that's what @role is for. I would naturally use <persName> for the Empress when I mean the pig, and <placeName> when I mean the pub: am I therefore allowed to use <name type="person"> rather than <name type="organism">, if the @ref or @key attached points to a <organism> rather than a <person> ?

ebeshero commented 2 years ago

@lb42 I don't see why not! It seems to me that the inline markup could classify the name as a <persName>, or <name type="person"> or <name type="organism"> as needed with an @ref pointing to a distinct identifier in an @xml:id on <organism>.

I was thinking about what would go inside the the <organism> in a prosopography entry. I definitely agree we wouldn't really want <organismName> , but <name> should be just fine on its own there (just as it would be in any other prosopography entry for <person>, <place>, <org>, <event>.) I believe we had better, though, make <organism> a member of att.typed like <object> and <place>, but not like <person>. That gives us a mechanism for distinguishing animals, plants, fungi, etc.

<organism xml:id="dolly" type="mammal" subtype="sheep" sex="F">
      <name>Dolly</name>
      <name>Dolly the sheep</name>
      <birth when="1996-07-05"/>
      <death when="2003-02-14"/>
    <desc>Not the first cloned mammal, but the first mammal ever cloned from an adult animal cell 
    to make an exact biological copy. Her cloned DNA came from the mammary cell of a 
    Finn Dorset sheep and was implanted in an the egg cell from a Scottish blackface sheep.</desc>
</organism>

Inline:

<name ref="#dolly">Dolly the sheep</name> lived for six years...
sydb commented 2 years ago

In response to @lb42’s three questions:

  1. I hope not.
  2. I do not think so. I think <person> is a special case of <organism> in a semantic (not necessarily content-model) sense, but <place> and <ojbect> (and <org> and <nym>) are not.
  3. I think you mean that we might start using the word “entity” to refer to the things named by named entities. I am not sure if you mean in our colloquial usage or in the Guidelines. I think it is a good enough idea that I am in favor of both.
sydb commented 2 years ago

In response to @lb42’s latest comment:

I am not sure what the old argument was, or on which side you were on, and I only care out of an interest in the history of TEI. As it stands now, we may wish that @role was how TEI indicated the type of object the encoded word or phrase is referencing, but it is not. @type, perversely, has that task. The @role attribute is used

to specify further information …, for example the occupation of a person, or the status of a place

Furthermore, that is how @role is exemplified in the Guidelines. Here are all the occurrences of elements of interest (i.e., not <cell>, <row>, <editor>, nor <roleName>) in examples that have a @role attribute.

  <name role="politician" type="person">David Paul Brown</name>

  <person sex="intersex" role="god" age="immortal">
    <persName>Hermaphroditos</persName>
    <persName xml:lang="grc">Ἑρμαφρόδιτος</persName>
  </person>

  <person xml:id="fr_Ovi01" sex="1" role="poet">
    <persName xml:lang="en">Ovid</persName>
    <persName xml:lang="la">Publius Ovidius Naso</persName>
    <birth when="-0044-03-20">
      20 March 43 BC
      <placeName>
    <settlement type="city">Sulmona</settlement>
    <country key="IT">Italie</country>
      </placeName>
    </birth>
    <death notBefore="0017" notAfter="0018">
      17 or 18 AD
      <placeName>
    <settlement type="city">Tomis (Constanta)</settlement>
    <country key="RO">Roumanie</country>
      </placeName>
    </death>
  </person>

  <!-- Same Ovid xmp in Taiwanese -->

  <!-- Same Ovid xmp in English -->

  <personGrp xml:id="pg1" role="audience" sex="mixed" size="approx 50"/>

  <!-- Same audience xmp in French -->

  <persName role="artist">Juan O'Gorman</persName>

  <!-- Same Juan O’Gorman xmp with different whitespace -->

I think using @type for classifying the thing being named or referred to for naming elements and <rs>, but using @type for classifying the element elsewhere was an absolutely huge mistake, which IIRC was present in P3, but not so much in P2. (I feel somewhat guilty that I did not notice this when I was doing a pre-publication review of P3 for you in c. 1992. I have always thought that the attribute for classifying the thing being named or referred to should be @of. :-)

martindholmes commented 1 year ago

Council F2F 2022-09-13 believes the solution is actually that rather than creating elements for more specific types of things, it would be better to create a more generic <entity> element, of which all the specifics we have (<person>, <org> etc.) are all syntactic sugar. This way, whatever your ontological preferences in dividing the world, you can use <entity type=”organism”> or whatever. This ticket should be closed and a new one raised for creating <entity> and <listEntity>. A small working group will probably be needed to work out the content model of <entity> and other details.

martindholmes commented 1 year ago

This is now replaced by issue #2341.