dkpro / dkpro-core

Collection of software components for natural language processing (NLP) based on the Apache UIMA framework.
195 stars 67 forks source link

Annotation types for speakers and direct/-indirect speech #599

Open reckart opened 9 years ago

reckart commented 9 years ago
There is a new annotation tool for quoted speech in CoreNLP that we might want to integrated,
but we have no types for such a thing yet.

Also, speaker information can be relevant for certain document types from the DH contexts.

Maybe we need several new types, e.g.

- Utterance
- Speaker
- QuotedSpeech

or maybe different types? Any opinions/examples?

Original issue reported on by richard.eckart on 2015-03-10 15:34:51

reckart commented 9 years ago
The DH community mostly uses TEI elements to encode text, maybe we should chose types
in accordance with what TEI offers, e.g.
Just an idea.

Original issue reported on by daxenberger.j on 2015-03-11 08:12:22

reckart commented 9 years ago
There is also some new specifications being worked on for TEI and text:

For me the main questions is, how much of these ideas we currently need to represent.
TEI tends to be very complex and we do probably not need all of the stuff right now.

Original issue reported on by richard.eckart on 2015-03-11 08:21:32

reckart commented 9 years ago
Using TEI naming and semantics is probably a good idea, but I agree with Richard that
we should not introduce everything.

Our policy has always been to only introduce types for which we already have a use
case :)

Original issue reported on by torsten.zesch on 2015-03-11 08:23:23

reckart commented 9 years ago
we should not mix quoted speech (for which we have a new annotator in CoreNLP) and transcribed
speech (for which we do not yet have annotators).

These are also treated different in TEI, see e.g. example of quoted speech given here:

Zui-Gan called out to himself every day, ‘Master.’

Then he answered himself, ‘Yes, sir.’

And then he added, ‘Become sober.’

Again he answered, ‘Yes, sir.’

‘And after that,’ he continued, ‘do not be deceived by others.’

‘Yes, sir; yes, sir,’ he replied.

see also: Core Tags for Drama
and Performance Texts

Transcribed speech requires different annotation types that are not relevant for quoted
speech, see Transcriptions of Speech
which Johannes mentioned earlier

See also:
 8 Transcriptions of Speech. These would be appropriate for encodings the focus of
which is on the actual performance of a text rather than its structure or formal properties.
The module described in that chapter includes a large number of other detailed proposals
for the encoding of such features as voice quality, prosody, etc., which might be relevant
to such a treatment of performance texts.


Original issue reported on by eckle.kohler on 2015-03-11 08:45:56

reckart commented 9 years ago
Fully agree. Besides the obvious (speaker etc.), TEI has:

- u (utterance): contains a stretch of speech usually preceded and followed by silence
or by a change of speaker.
- pause: marks a pause either between or within utterances.
- vocal: marks any vocalized but not necessarily lexical phenomenon, for example voiced
pauses, non-lexical backchannels, etc.
- kinesic: marks any communicative phenomenon, not necessarily vocalized, for example
a gesture, frown, etc.
- incident: marks any phenomenon or occurrence, not necessarily vocalized or communicative,
for example incidental noises or other events affecting communication.
- writing: contains a passage of written text revealed to participants in the course
of a spoken text.
- shift: marks the point at which some paralinguistic feature of a series of utterances
by any one speaker changes.

Maybe we should condense this to 3-4 elements, e.g. the first 4?

Original issue reported on by daxenberger.j on 2015-03-11 08:48:58

reckart commented 9 years ago
Judith is right, we should carefully consider how to separate transcribed speech (which
I had in mind, and which is a relevant document type in DH) and quoted speech (which
CoreNLP offers, but apparently at a very basic level). Maybe the TEI conventions go
to far here, and we should rather keep it simple for now; with transcribed speech in
mind for a later point in time.

Original issue reported on by daxenberger.j on 2015-03-11 09:44:03

reckart commented 9 years ago
Ok, so I gather we should have at least two annotation types:

One for transcribed speech that roughly correspond to the TEI "u" element.

One for quoted speech that roughly corresponds to the TEI "q" element.

Both should have a feature that indicates who is the speaker, roughly corresponding
to the "who" attribute on the "q" and "u" TEI elements.

My feeling is, that this would cover all immediate needs.

Original issue reported on by richard.eckart on 2015-03-11 12:37:43

reckart commented 9 years ago
Sounds like a good starting point to me.

Original issue reported on by daxenberger.j on 2015-03-11 12:42:37

reckart commented 9 years ago
I think a good place to put such types would be the api.segmentation module.

Original issue reported on by richard.eckart on 2015-03-11 12:44:21

reckart commented 9 years ago
Hi DKPro Core people!

There is actually an ISO standard for transcriptions of speech now, based on TEI, resulting
from the work presented in the link Richard posted above (
and interoperable with most of the common transcription formats and even some widely
used transcription conventions. You can find more info at
and, and otherwise I'd try to answer any
questions you might have...

Best regards,
Hanna Hedeland, HZSK/CLARIN-D

Original issue reported on by hanna.hedeland on 2015-03-11 13:15:03

reckart commented 9 years ago
I think the annotations "quoted speech" and "speaker" would be definitely handy. 

Two things that would additionally suit my purposes:
1) storing the quoted speech utterances in some sort of structured way, for example:
‘And after that,’ he continued, ‘do not be deceived by others.’
are two Utterances of one DirectSpeech of one Speaker.

2) storing the speaker as a probability vector - in modern literature it is less common
to see: 
"There," said John
but rather things like:
"There!" John's finger pointed to Jack. 

Then I would save in the speaker something like [John, 0.9; Jack, 0.1] 
But perhaps the general case are annotated, known speakers, and the speaker prediction
is just a special use-case.

Original issue reported on by l.flekova on 2015-03-11 13:48:12

reckart commented 9 years ago

@1) isn't it one utterance that is interrupted by a piece of text? I mean, I suppose
that "he" didn't actually make a significant pause while saying that. There has been
some contemplating on whether it might be useful/necessary to be able to model discontinous
utterances (or in this case quoted speech spans).

@2) We do not have a concept of probabilities in any DKPro Core types yet. Right now,
we assume that there is one truth. I think it would fall to a particular experiment
setup to sub-class the DKPro Core type adding probabilities as needed for the individual

Original issue reported on by richard.eckart on 2015-03-11 15:07:44

reckart commented 9 years ago
Hi Richard, 

@2) Makes sense, no objections to that.

@1) In principle you are right. I'll follow up if I can think of a counter-example
where one would need to model it.

Original issue reported on by l.flekova on 2015-03-11 16:22:40