TEIC / TEI

The Text Encoding Initiative Guidelines
https://www.tei-c.org
Other
269 stars 88 forks source link

`<standOff>` should be allowed to contain `<taxonomy>` #2530

Open sydb opened 4 months ago

sydb commented 4 months ago

Per Georg Vogeler’s suggestion on TEI-L (itself based on a comment by @martindholmes), this issue is a suggestion to allow <taxonomy> as a child of <standOff>. (Probably by adding it to model.standOffPart.)

NOTE — This issue is not about adding <div> to the content of <catDesc>.

lb42 commented 4 months ago

I confess to some disquiet at this proposal. Is it really a good idea to allow taxonomy anywhere additional to where it is currently defined. What's wrong with requiring it within the TEI Header encodingDesc? if both are permitted, on what basis does the encoder decide whether to define their taxonomy within standOff rather than there? is it really helpful to multiply the possible valid locations for this particular piece of metadata? or does it just make it easier for header-haters to do without one?

sydb commented 4 months ago

is it really helpful to multiply the possible valid locations for this particular piece of metadata?

Yes, I think so. I think the use case of dozens or hundreds of files at a given project that use the same taxonomy (or even taxonomies) is so common that it makes a lot of sense to allow it someplace other than the <teiHeader>. Yes, one could argue that all the TEI files that share a taxonomy are at least in some sense a corpus, and thus should be organized as a <teiCorpus>, and the common taxonomy should be encoded as the /teiCorpus/teiHeader/encodingDesc/classDecl/taxonomy. But I suspect for a lot of projects there is little to no other reason to actually generate a <teiCorpus> file at all, and thus it would make their lives quite a bit easier to tuck this information int a TEI/standOff.

lb42 commented 4 months ago

Well, but your argument for using teiCorpus as parent really doesn't seem to me relevant. Unless I am misunderstanding completely, a standOff element has to be contained by a TEI or a teiCorpus. In which case it must have a sibling teiHeader, which is where I would expect to see all the relevant metadata gathered together, including any relevant taxonomy/ies. I would be happier actually putting the standOff inside the teiHeader too, but that train seems to have left.

laurentromary commented 4 months ago

I would argue for the opposite: allowing standOff to bear its own header. That would solve so many issues at once...

lb42 commented 4 months ago

Sorry Laurent, but that seems even madder. My understanding is that by its nature a standOff is not complete - it is useful in conjunction with something else (the text which it stands off of!) : therefore there's need for something to group the two together. And the TEI or teiCorpus as currently defined does that job very nicely thank you.

laurentromary commented 4 months ago

The reasoning is that there is a whole bunch of metadata that are specific to a given annotation layer (from who/what did the job, to specific licensing or specific... taxinomies). Look at any linguistic annotation project on the basis of a pre-existing corpus. Deux salles, deux ambiances.

lb42 commented 4 months ago

Is it often the case that you have multiple annotation layers, each represented by a separate standOff? But even in that case, what's wrong with wrapping each one with its own teiHeader inside its own TEI?

laurentromary commented 4 months ago

If you do so, you don't even need standOff. Agree: a teiCopus with a main TEI and additional ones for annotations, but than, you can have you spanGrp, linkGrp etc. in the body of the TEI and drop standOff altogether. This is not +exactly+ what was intended.

sydb commented 4 months ago

I think @lb42 and I are not really that far apart. But if I had my druthers, it would not be that <standOff> is a child of <teiHeader>, but rather that <taxonomy> had always been a child of <standOff>.

In part, I think of <standOff> as the useful container for stuff that is not direct transcription of the source, but is not (at least in the transcriber’s mind) truly metadata about the TEI document, either. Simultaneously, some data is best constrained closer to the transcription than the ODD file, and <standOff> can serve that purpose, as well. (See my Balisage paper on the topic, if you care.)

The <taxonomy> element, which may well contain a bibliographic citation to a taxonomy (using, e.g., <bibl>) instead of an actual definition of a taxonomy (using <category>s), was originally intended to be used for classifying the current TEI document within some useful taxonomy of texts. For that use, it is (IMHO) far more metadata than data, and placement in the /TEI/teiHeader/encodingDesc/classDecl is quite reasonable. But, for better or worse, over the years lots of folks have used <taxonomy> to define all sorts of other taxonomies for all sorts of other things, and TEI has responded (again, for better or worse) by changing the definition from “defines a typology used to classify texts either implicitly, by means of a bibliographic citation, or explicitly by a structured taxonomy” (emphasis (of what was changed) mine) to “defines a typology either implicitly, by means of a bibliographic citation, or explicitly by a structured taxonomy”, even though it is still a child of <classDecl> and discussed in that section of the main prose. These other taxonomies, which are not directly about the categorization of the TEI file itself, seem to me make more sense in <standOff>. Given that a) these taxonomies, like personographies and annotations, are a set of things that either point into the transcribed text or are pointed at by the transcribed text , and b) these taxonomies are (in my entirely unscientific perception of the TEI universe) far more common than those that are strictly about categorizing the current text, I am (at least for now) in favor of allowing them in <standOff>.

I think we are literally at the point where one man’s metadata is another’s data. :grin:

martindholmes commented 3 months ago

I'd like to add the proposal that <handNotes> also be allowed in <standOff>. The same arguments pertain, I think: it's a list of items that may be pointed to from inside a transcribed text. @lb42's arguments regarding header vs standOff pertain here too, but it's just another kind of ography like other ographies.