TEIC / TEI

The Text Encoding Initiative Guidelines
https://www.tei-c.org
Other
269 stars 88 forks source link

New `<entity>` and `<listEntity>` elements are needed #2341

Open martindholmes opened 1 year ago

martindholmes commented 1 year ago

Arising out of issue #2295, Council believes that we should create a more generic <entity> element, of which all the specifics we have (<person>, <org> etc.) are all syntactic sugar. This way, whatever your ontological preferences in dividing the world, you can use <entity type=”organism”> or whatever. A small working group will probably be needed to work out the content model of <entity> and other details.

juanacevedo commented 10 months ago

hello, I know that Add an organism element

2295 has been closed now. I've read the discussion and I was wondering, wouldn't an element <bioName> be useful? Patterned after <persName> and <placeName>, it could apply to what is currently under the purview of binomial nomenclature (broadly speaking, plants and animals). The fact that binomial nomenclature is regulated by internationally agreed and maintained codes would be of great help towards the uniformity and interoperability of the data.

I work now from Lisbon on early modern travel/nautical literature, and a biologist in our team would like to start tagging plants and animals in the text. The proposed <entity type="organism"> may seem appropriate from a content-modelling point of view, but down to the concrete nitty-gritty of "organism" identification it would be as cumbersome and unspecific as it was to use <name type="person"> for persons.

Since we would just like to make sure we stick by agreed and common usage, here is a more general related question: does Council F2F 2022-09-13 decision to "create a more generic <entity> element" signal a turn to using more generic elements? I mean, should we use <name type="person"> elements instead of <persName> and so on?

lb42 commented 10 months ago

I cannot speak for current council decisions, but certainly I think it would be a big mistake to adopt a policy of only creating "more generic" elements. (A reductio ad absurdum would be to say we really only need one TEI element : <tag> with a @type attribute) That's not to say that it would not be useful to create generic elements: they provide an essential safety net where the terminology or modelling praxis of a given field have not yet stabilised. I really don't think that's the case for biological terminology though! So I would strongly urge you and your colleagues to propose more specific elements appropriate to your needs, and lobby for their inclusion in a future release.

juanacevedo commented 10 months ago

I cannot speak for current council decisions, but certainly I think it would be a big mistake to adopt a policy of only creating "more generic" elements. (A reductio ad absurdum would be to say we really only need one TEI element : <tag> with a @type attribute) That's not to say that it would not be useful to create generic elements: they provide an essential safety net where the terminology or modelling praxis of a given field have not yet stabilised. I really don't think that's the case for biological terminology though! So I would strongly urge you and your colleagues to propose more specific elements appropriate to your needs, and lobby for their inclusion in a future release.

Thanks. It took me some time to come back because I wanted to confer with a colleague on some of the implications of what we propose, to try and think it through a little more, also from the perspective of the biologist.

We would like to propose specifically the inclusion of a new element <bioName> in a future release. As mentioned above, we conceive it as patterned after <persName> and <placeName>, and we think it could apply to what is conventionally the object of binomial nomenclature. The fact that binomial nomenclature is regulated by international bodies would help towards the uniformity and interoperability of the data.

We are aware of several DH projects on early modern corpora of travel/geographic/nautical literature which, like our own, abound in descriptions of the natural environment. As we start analysing and tagging our documents, we increasingly feel there is a case for the usage of a dedicated element.

Having done some practical experiments, we are right now imagining this kind of tagging: <bioName type="animal" subtype="domestic">dog</bioName>, but we would very much appreciate feedback, suggestions, and criticism.

ebeshero commented 10 months ago

Greetings, @juanacevedo I was the one who opened the ticket calling for <organism>, and the proposal underwent some intense discussion in Council last year around this time! We emerged from the discussion with a path forward and a new ticket, calling for an <entity> and <listEntity> element. The idea is that we can then adapt the element to give it an @type of organism, etc. as needed.

ebeshero commented 10 months ago

@juanacevedo Indeed, as you say, Council is trying to find a way to organize named entities in a way that's helpfully multipurposeful. When we move into the realm of classifying life forms that are not necessarily persons, we found <entity> a little more flexible and adaptable for encoding than <organism>. It was indeed a bit of a paradigm shift to think this way about the markup!

ebeshero commented 10 months ago

@juanacevedo That said, a proposal for <bioName> would be interesting to consider, and could even point into a canonical <entity> records in a <listEntity>. We could also revive discussion of <organism>, though I will point out that @lb42 seemed not much in favor of that particular element at the time, partly due to the complication of an implied <organismName> to go with it--almost certainly to be confused with <orgName>(completely different--the name of an organization).

The naming of entities, humanoid or not, with personhood or not, is a curious complexity. Our discussion last year at Newcastle was eye-opening--sometimes less determination creates a more adaptable structure for people encoding texts about life forms of all kinds.