TEIC / TEI

The Text Encoding Initiative Guidelines
https://www.tei-c.org
Other
279 stars 88 forks source link

New element annotatedU #539

Closed TEITechnicalCouncil closed 7 years ago

TEITechnicalCouncil commented 9 years ago

[This is the second of a few tickets related to the TEI/ISO standard for transcriptions of spoken language: see http://bit.ly/1jyZC37 ]

It is usual to segment transcribed speech into smaller chunks for which the existing <u> element is appropriate. This proposal suggests a way of grouping each such chunk with one or more tiers of annotation, as is common practice.

Original comment by: @lb42

TEITechnicalCouncil commented 9 years ago

This issue was originally assigned to SF user: louburnard Current user is: lb42

TEITechnicalCouncil commented 9 years ago

We should probably see how we could also deal with such cases by leans of the stand-off element. I see the two options as complementary flavors (for many pieces of speech annotation software an interleaved representation à la annotationU is easier; whereas for some other use cases, it is better to leave the primary transcription "untouched")

Original comment by: @laurentromary

TEITechnicalCouncil commented 9 years ago

After going back and forth between the ISO proposal and the stdf proposal. I see the possibility to create an element that would be slightly more generic than annotated you, which we could call annotationGrp. This element could be used to group together series of annotations associated to the same primary object (e.g. the same u element) either by having this object as a child (i.e. what we wanted with annotatedU: a u with a series of spanGrp for instance) or in a stand-off mode within the annotations sub-element of stdf. The specification of this element could be as follows:


<elementSpec ident="annotationGrp" mode="add" ns="http://standoff.proposal">
   <desc>Groups together various annotations, for instance for parallel interpretations of a spoken segment</desc>
   <classes>
      <memberOf key="model.annotationPart"/>
      <memberOf key="model.divPart.spoken"/>
      <memberOf key="att.timed"/>
      <memberOf key="att.global"/>
      <memberOf key="att.ascribed"/>
   </classes>
   <content>
      <rng:zeroOrMore>
         <rng:choice>
            <rng:ref name="u"/>
            <rng:ref name="model.global.meta"/>
            <rng:ref name="model.annotationPart"/>
         </rng:choice>
      </rng:zeroOrMore>
   </content>
</elementSpec>

with the idea that model.annotationPart would be the hook where one could add any kind of internal or external annotation object. For instance in my tests, I make model.global.meta member of this class to get spanGrp and the like in it.

Original comment by: @laurentromary

TEITechnicalCouncil commented 9 years ago

Generalizing is always nice. But what is "stdf" please?

Original comment by: @lb42

TEITechnicalCouncil commented 9 years ago

stdf is a proposed element badly in need of a name approved for all audiences.

Please see ticket #378, then the google doc linked from there, then Peter Stadler's ODD proposal for standoff annotations, linked from the google doc...

Original comment by: @bansp

TEITechnicalCouncil commented 9 years ago

There is also a github project (https://github.com/laurentromary/stdfSpec), where I maintain updates on the stdf proposal and some samples, which shows how annotatedU can be used nine or stand-off in relation to speech transcription.

Original comment by: @laurentromary

TEITechnicalCouncil commented 9 years ago

Original comment by: @lb42

TEITechnicalCouncil commented 9 years ago

Referring to the document at https://docs.google.com/document/d/1BTjYHSiPjD6GhKMNFmZrrvCkLQAa1RK7aGbG5K50uN4

Section 6.5.2 ("Representation as unclear or gap") says that when an string of words is unclear, and alternatives are proposed, the strings should each be wrapped in a separate span element (within choice, within unclear). I think this meant to say "a separate seg element" ; and indeed the examples given two sections later (6.5.4) use seg, not span. Probably just the usual code-switching problem between HTML span and TEI seg.

Section 5.7 (6.7 as listed in the TOC) on "Global divisions" proposes that divisions of the transcription at levels superordinate to the utterance should be accomplished by the use of non-tessellating divs. Unless utterance and annotated utterance themselves are regarded as syntactic sugar for div type="utterance", this is surely a very un-TEI way of doing things. Do we really mean to slip floating divs into the scheme by this means?

Original comment by: @pfschaffner

TEITechnicalCouncil commented 9 years ago

I have suggested a revision to the document precluding non-tesselating divs. In the meantime, do we have agreement on introducing a new <annotatedU> element, a spec for which would look something like this

<elementSpec ident="annotatedU" ns="http://iso-tei-spoken.org/ns/1.0">
<desc>groups an utterance with the  annotation layers associated with
it</desc>
<classes>
<memberOf key="model.divPart.spoken"/>
</classes>
<content>
      <group xmlns="http://relaxng.org/ns/structure/1.0">
    <ref name="u"/>
    <oneOrMore>
      <ref name="spanGrp"/>
    </oneOrMore>
      </group>
</content>     
</elementSpec>

Original comment by: @lb42

TEITechnicalCouncil commented 9 years ago

@Lou: please see above the new name + specification for annotationGrp, comprising the creation of a class model.annotationPart allowing an easy customization of the content depending of the kind of annotation object people will use (e.g. term entries, NER, open annotation objects, what have you)

Original comment by: @laurentromary

TEITechnicalCouncil commented 9 years ago

So you want to replace "annotatedU" with "annotationGrp" ?

Original comment by: @lb42

TEITechnicalCouncil commented 9 years ago

Yes. See Thomas' last document.

Original comment by: @laurentromary

TEITechnicalCouncil commented 9 years ago

For the benefit of others trying to follow this ticket, "Thomas' last document" is an entirely new docx version of the googledoc, the existence of which I learned of about 20 minutes ago when he sent me a copy !

Original comment by: @lb42

TEITechnicalCouncil commented 9 years ago

The current version of this latest draft is now available from https://sourceforge.net/p/tei/code/HEAD/tree/trunk/Incubator/Spoken/ISO-TEI-Transcription_of_spoken_language_FINAL_DRAFT_EDIT2_LR.docx

Original comment by: @lb42

TEITechnicalCouncil commented 9 years ago

Could we put this behind a pwd protected place. We may have a pb with ISO copyrighted documents. (I am +not+ opening a debate, just mentioning)

Original comment by: @laurentromary

TEITechnicalCouncil commented 9 years ago

Well, we have the wiki, but that is hardly secure. If you want to restrict access to this document, then clearly it is not yet ready for discussion by the TEI, so I will remove it.

Original comment by: @lb42

lb42 commented 9 years ago

The latest version of the ISO proposal has apparently renamed this element as "annotationGrp". Unfortunately, TEI naming conventions require that an element named xxxGrp contains only xxx elements, which is not the case here. Perhaps a better name might be "annotationUnit" or "annotationBlock" ?

laurentromary commented 9 years ago

I must say I like both (annotationUnit or annotationBlock). If a decision could be taken quickly by the council. We would make sure that the final ISO publication would refer to it. We actually presented the case in ISO as pending the naming decision by the TEI council.

lb42 commented 8 years ago

So are we agreed on the following: a) we add a new element <annotationBlock> with a structure like that proposed above (under the name "annotationGrp") b) we add some discussion and examples of its use to the current TS (transcribed speech) chapter, and probably also refer to it in current AI (anaytic info) chapter.

If so, I'd appreciate some help confecting the latter. Laurent? Tomas?

laurentromary commented 8 years ago

I am sending to Hugh the ISO document which is under balloting and from which the council can take up examples. Come back to me and Thomas (now subscribed to both tickets) for any additional information. We are about (ballot finishing in one week) to lock the element name to <annotationBlock>, so it would actually be optimal not to go towards another name. The content model should validate all examples form the ISO document in any case (thus making <u> optional to allow a stand-off mechanism, and rely on the use of a class (model.annotationBlockPart ?) to allow more than just <span> and make further customization easy.

lb42 commented 8 years ago

I've now seen the PDF of the draft: it still says "annotationGrp" rather than "annotationBlock", but on the assumption that you will change that, I can do my best to get "annotationBlock" into the next release of TEI P5 (due around easter time). I will also check the examples in the PDF file (would be easier if I had the source) : how do you want to be notified of any problems that show up?

laurentromary commented 8 years ago

Of course, since it is under ballot. We have already filed in a comment requesting the change to annotationBlock. So please go ahead with the implementation. Please notify me and Tomas if anything is wrong.

lb42 commented 8 years ago

In which TEI module should <annotationBlock> be defined? In spoken or in analysis ?

laurentromary commented 8 years ago

Clearly analysis. It is potentially a tool for grouping annotations related to quite a range of object and of course an essential piece for standOff.

bansp commented 8 years ago

I concur -- it would be ideal for it to sit in a standoff module, but since there is no such module (yet?), analysis is definitely the way to go.

lb42 commented 8 years ago

some simple usage examples would be very helpful, if anyone has them.

laurentromary commented 8 years ago

Following a more in-depth discussion with @lb42 we suggest to make the content model of <annotationBlock> more flexible by means of two sub-classes:

The content model of <annotationBlock> would be something like: (model.annotableSegment?, model.annotation*)

These classes could be bootstrapped with typical TEI elements that would have the appropriate semantic for the corresponding function in annotationBlock:

In the case of a stand-off use of annotationBlock, we may consider either to make the annotableSegment optional or use <span> to point to the annotated object.

sydb commented 8 years ago

@sydb wonders aloud (for @laurentromary to answer) if requiring the model.annotableSegment bit would get rid of the ambiguity that occurs when you want to annotate (with <seg>) a segment (encoded with <seg>). Add <ptr> to model.annotableSegemnt, so if you want to annotate something indicated by a pointer, put in a pointer to it!

laurentromary commented 8 years ago

The issue of ambiguity is one for which I do not have an answer. In theory (if XML schemas were no headache), I would like to have the two model classes above. But in practice, we may just resolve to have one and provide written guidelines as to proper usage: for instance mapping this to the Open Annotation model as already alluded to in https://hal.inria.fr/hal-01254365 I should push myself to submit an abstract on all this for Vienna...

laurentromary commented 8 years ago

@hcayless : getting tired? The comment is not related to the ticket, is it?

hcayless commented 8 years ago

@laurentromary Wrong ticket. Deleted.

sydb commented 8 years ago

Council to prod LB to prod LR.

lb42 commented 7 years ago

@laurentromary The element <annotationBlock> is now in the Guidelines. Can we close this issue?

laurentromary commented 7 years ago

Yes. There will be a specific ticket for updating the content model of annotationBlock

lb42 commented 7 years ago

OK, thanks. Closing this one.