Closed TEITechnicalCouncil closed 7 years ago
This issue was originally assigned to SF user: louburnard Current user is: lb42
We should probably see how we could also deal with such cases by leans of the stand-off element. I see the two options as complementary flavors (for many pieces of speech annotation software an interleaved representation à la annotationU is easier; whereas for some other use cases, it is better to leave the primary transcription "untouched")
Original comment by: @laurentromary
After going back and forth between the ISO proposal and the stdf proposal. I see the possibility to create an element that would be slightly more generic than annotated you, which we could call annotationGrp. This element could be used to group together series of annotations associated to the same primary object (e.g. the same u element) either by having this object as a child (i.e. what we wanted with annotatedU: a u with a series of spanGrp for instance) or in a stand-off mode within the annotations sub-element of stdf. The specification of this element could be as follows:
<elementSpec ident="annotationGrp" mode="add" ns="http://standoff.proposal">
<desc>Groups together various annotations, for instance for parallel interpretations of a spoken segment</desc>
<classes>
<memberOf key="model.annotationPart"/>
<memberOf key="model.divPart.spoken"/>
<memberOf key="att.timed"/>
<memberOf key="att.global"/>
<memberOf key="att.ascribed"/>
</classes>
<content>
<rng:zeroOrMore>
<rng:choice>
<rng:ref name="u"/>
<rng:ref name="model.global.meta"/>
<rng:ref name="model.annotationPart"/>
</rng:choice>
</rng:zeroOrMore>
</content>
</elementSpec>
with the idea that model.annotationPart would be the hook where one could add any kind of internal or external annotation object. For instance in my tests, I make model.global.meta member of this class to get spanGrp and the like in it.
Original comment by: @laurentromary
Generalizing is always nice. But what is "stdf" please?
Original comment by: @lb42
stdf is a proposed element badly in need of a name approved for all audiences.
Please see ticket #378, then the google doc linked from there, then Peter Stadler's ODD proposal for standoff annotations, linked from the google doc...
Original comment by: @bansp
There is also a github project (https://github.com/laurentromary/stdfSpec), where I maintain updates on the stdf proposal and some samples, which shows how annotatedU can be used nine or stand-off in relation to speech transcription.
Original comment by: @laurentromary
Original comment by: @lb42
Referring to the document at https://docs.google.com/document/d/1BTjYHSiPjD6GhKMNFmZrrvCkLQAa1RK7aGbG5K50uN4
Section 6.5.2 ("Representation as unclear or gap") says that when an string of words is unclear, and alternatives are proposed, the strings should each be wrapped in a separate span element (within choice, within unclear). I think this meant to say "a separate seg element" ; and indeed the examples given two sections later (6.5.4) use seg, not span. Probably just the usual code-switching problem between HTML span and TEI seg.
Section 5.7 (6.7 as listed in the TOC) on "Global divisions" proposes that divisions of the transcription at levels superordinate to the utterance should be accomplished by the use of non-tessellating divs. Unless utterance and annotated utterance themselves are regarded as syntactic sugar for div type="utterance", this is surely a very un-TEI way of doing things. Do we really mean to slip floating divs into the scheme by this means?
Original comment by: @pfschaffner
I have suggested a revision to the document precluding non-tesselating divs. In the meantime, do we have agreement on introducing a new <annotatedU> element, a spec for which would look something like this
<elementSpec ident="annotatedU" ns="http://iso-tei-spoken.org/ns/1.0">
<desc>groups an utterance with the annotation layers associated with
it</desc>
<classes>
<memberOf key="model.divPart.spoken"/>
</classes>
<content>
<group xmlns="http://relaxng.org/ns/structure/1.0">
<ref name="u"/>
<oneOrMore>
<ref name="spanGrp"/>
</oneOrMore>
</group>
</content>
</elementSpec>
Original comment by: @lb42
@Lou
: please see above the new name + specification for annotationGrp, comprising the creation of a class model.annotationPart allowing an easy customization of the content depending of the kind of annotation object people will use (e.g. term entries, NER, open annotation objects, what have you)
Original comment by: @laurentromary
So you want to replace "annotatedU" with "annotationGrp" ?
Original comment by: @lb42
Yes. See Thomas' last document.
Original comment by: @laurentromary
For the benefit of others trying to follow this ticket, "Thomas' last document" is an entirely new docx version of the googledoc, the existence of which I learned of about 20 minutes ago when he sent me a copy !
Original comment by: @lb42
The current version of this latest draft is now available from https://sourceforge.net/p/tei/code/HEAD/tree/trunk/Incubator/Spoken/ISO-TEI-Transcription_of_spoken_language_FINAL_DRAFT_EDIT2_LR.docx
Original comment by: @lb42
Could we put this behind a pwd protected place. We may have a pb with ISO copyrighted documents. (I am +not+ opening a debate, just mentioning)
Original comment by: @laurentromary
Well, we have the wiki, but that is hardly secure. If you want to restrict access to this document, then clearly it is not yet ready for discussion by the TEI, so I will remove it.
Original comment by: @lb42
The latest version of the ISO proposal has apparently renamed this element as "annotationGrp". Unfortunately, TEI naming conventions require that an element named xxxGrp contains only xxx elements, which is not the case here. Perhaps a better name might be "annotationUnit" or "annotationBlock" ?
I must say I like both (annotationUnit or annotationBlock). If a decision could be taken quickly by the council. We would make sure that the final ISO publication would refer to it. We actually presented the case in ISO as pending the naming decision by the TEI council.
So are we agreed on the following: a) we add a new element <annotationBlock> with a structure like that proposed above (under the name "annotationGrp") b) we add some discussion and examples of its use to the current TS (transcribed speech) chapter, and probably also refer to it in current AI (anaytic info) chapter.
If so, I'd appreciate some help confecting the latter. Laurent? Tomas?
I am sending to Hugh the ISO document which is under balloting and from which the council can take up examples. Come back to me and Thomas (now subscribed to both tickets) for any additional information.
We are about (ballot finishing in one week) to lock the element name to <annotationBlock>
, so it would actually be optimal not to go towards another name. The content model should validate all examples form the ISO document in any case (thus making <u>
optional to allow a stand-off mechanism, and rely on the use of a class (model.annotationBlockPart ?) to allow more than just <span>
and make further customization easy.
I've now seen the PDF of the draft: it still says "annotationGrp" rather than "annotationBlock", but on the assumption that you will change that, I can do my best to get "annotationBlock" into the next release of TEI P5 (due around easter time). I will also check the examples in the PDF file (would be easier if I had the source) : how do you want to be notified of any problems that show up?
Of course, since it is under ballot. We have already filed in a comment requesting the change to annotationBlock. So please go ahead with the implementation. Please notify me and Tomas if anything is wrong.
In which TEI module should <annotationBlock> be defined? In spoken or in analysis ?
Clearly analysis. It is potentially a tool for grouping annotations related to quite a range of object and of course an essential piece for standOff.
I concur -- it would be ideal for it to sit in a standoff module, but since there is no such module (yet?), analysis is definitely the way to go.
some simple usage examples would be very helpful, if anyone has them.
Following a more in-depth discussion with @lb42 we suggest to make the content model of <annotationBlock>
more flexible by means of two sub-classes:
<annotationBlock>
elementThe content model of <annotationBlock>
would be something like:
(model.annotableSegment?, model.annotation*)
These classes could be bootstrapped with typical TEI elements that would have the appropriate semantic for the corresponding function in annotationBlock:
<u>
(as in the ISO standard proposal), <seg>
(for written texts), <zone>
(when the annotation is directly about an image)<span>
and <spanGrp>
(cf. ISO document), <interp>
, and <interpGrp>
(obvious...), <fs>
(generic purpose FS based annotation)In the case of a stand-off use of annotationBlock, we may consider either to make the annotableSegment optional or use <span>
to point to the annotated object.
@sydb wonders aloud (for @laurentromary to answer) if requiring the model.annotableSegment bit would get rid of the ambiguity that occurs when you want to annotate (with <seg>
) a segment (encoded with <seg>
). Add <ptr>
to model.annotableSegemnt, so if you want to annotate something indicated by a pointer, put in a pointer to it!
The issue of ambiguity is one for which I do not have an answer. In theory (if XML schemas were no headache), I would like to have the two model classes above. But in practice, we may just resolve to have one and provide written guidelines as to proper usage: for instance mapping this to the Open Annotation model as already alluded to in https://hal.inria.fr/hal-01254365 I should push myself to submit an abstract on all this for Vienna...
@hcayless : getting tired? The comment is not related to the ticket, is it?
@laurentromary Wrong ticket. Deleted.
Council to prod LB to prod LR.
@laurentromary The element <annotationBlock> is now in the Guidelines. Can we close this issue?
Yes. There will be a specific ticket for updating the content model of annotationBlock
OK, thanks. Closing this one.
[This is the second of a few tickets related to the TEI/ISO standard for transcriptions of spoken language: see http://bit.ly/1jyZC37 ]
It is usual to segment transcribed speech into smaller chunks for which the existing <u> element is appropriate. This proposal suggests a way of grouping each such chunk with one or more tiers of annotation, as is common practice.
Original comment by: @lb42