google-code-export / dkpro-core-asl

Automatically exported from code.google.com/p/dkpro-core-asl
0 stars 0 forks source link

Annotation types for speakers and direct/-indirect speech #599

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
There is a new annotation tool for quoted speech in CoreNLP that we might want 
to integrated, but we have no types for such a thing yet.

Also, speaker information can be relevant for certain document types from the 
DH contexts.

Maybe we need several new types, e.g.

- Utterance
- Speaker
- QuotedSpeech

or maybe different types? Any opinions/examples?

Original issue reported on code.google.com by richard.eckart on 10 Mar 2015 at 3:34

GoogleCodeExporter commented 9 years ago
The DH community mostly uses TEI elements to encode text, maybe we should chose 
types in accordance with what TEI offers, e.g. 
http://www.tei-c.org/release/doc/tei-p5-doc/en/html/TS.html? Just an idea.

Original comment by daxenber...@gmail.com on 11 Mar 2015 at 8:12

GoogleCodeExporter commented 9 years ago
There is also some new specifications being worked on for TEI and text: 
http://www.tei-c.org/Activities/Council/Working/tcw25.xml

For me the main questions is, how much of these ideas we currently need to 
represent. TEI tends to be very complex and we do probably not need all of the 
stuff right now.

Original comment by richard.eckart on 11 Mar 2015 at 8:21

GoogleCodeExporter commented 9 years ago
Using TEI naming and semantics is probably a good idea, but I agree with 
Richard that we should not introduce everything.

Our policy has always been to only introduce types for which we already have a 
use case :)

Original comment by torsten....@gmail.com on 11 Mar 2015 at 8:23

GoogleCodeExporter commented 9 years ago
we should not mix quoted speech (for which we have a new annotator in CoreNLP) 
and transcribed speech (for which we do not yet have annotators).

These are also treated different in TEI, see e.g. example of quoted speech 
given here:
http://www.tei-c.org/release/doc/tei-p5-doc/de/html/SA.html

Zui-Gan called out to himself every day, ‘Master.’

Then he answered himself, ‘Yes, sir.’

And then he added, ‘Become sober.’

Again he answered, ‘Yes, sir.’

‘And after that,’ he continued, ‘do not be deceived by others.’

‘Yes, sir; yes, sir,’ he replied.

see also: Core Tags for Drama 
http://www.tei-c.org/release/doc/tei-p5-doc/de/html/CO.html#CODR
and Performance Texts 
http://www.tei-c.org/release/doc/tei-p5-doc/de/html/DR.html#DRPAL

Transcribed speech requires different annotation types that are not relevant 
for quoted speech, see Transcriptions of Speech 
http://www.tei-c.org/release/doc/tei-p5-doc/de/html/TS.html which Johannes 
mentioned earlier

See also:
http://www.tei-c.org/release/doc/tei-p5-doc/de/html/DR.html#DRPAL:
 8 Transcriptions of Speech. These would be appropriate for encodings the focus of which is on the actual performance of a text rather than its structure or formal properties. The module described in that chapter includes a large number of other detailed proposals for the encoding of such features as voice quality, prosody, etc., which might be relevant to such a treatment of performance texts.

Judith

Original comment by eckle.kohler on 11 Mar 2015 at 8:45

GoogleCodeExporter commented 9 years ago
Fully agree. Besides the obvious (speaker etc.), TEI has:

- u (utterance): contains a stretch of speech usually preceded and followed by 
silence or by a change of speaker.
- pause: marks a pause either between or within utterances.
- vocal: marks any vocalized but not necessarily lexical phenomenon, for 
example voiced pauses, non-lexical backchannels, etc.
- kinesic: marks any communicative phenomenon, not necessarily vocalized, for 
example a gesture, frown, etc.
- incident: marks any phenomenon or occurrence, not necessarily vocalized or 
communicative, for example incidental noises or other events affecting 
communication.
- writing: contains a passage of written text revealed to participants in the 
course of a spoken text.
- shift: marks the point at which some paralinguistic feature of a series of 
utterances by any one speaker changes.

Maybe we should condense this to 3-4 elements, e.g. the first 4?

Original comment by daxenber...@gmail.com on 11 Mar 2015 at 8:48

GoogleCodeExporter commented 9 years ago
Judith is right, we should carefully consider how to separate transcribed 
speech (which I had in mind, and which is a relevant document type in DH) and 
quoted speech (which CoreNLP offers, but apparently at a very basic level). 
Maybe the TEI conventions go to far here, and we should rather keep it simple 
for now; with transcribed speech in mind for a later point in time.

Original comment by daxenber...@gmail.com on 11 Mar 2015 at 9:44

GoogleCodeExporter commented 9 years ago
Ok, so I gather we should have at least two annotation types:

One for transcribed speech that roughly correspond to the TEI "u" element.

One for quoted speech that roughly corresponds to the TEI "q" element.

Both should have a feature that indicates who is the speaker, roughly 
corresponding to the "who" attribute on the "q" and "u" TEI elements.

My feeling is, that this would cover all immediate needs.

Original comment by richard.eckart on 11 Mar 2015 at 12:37

GoogleCodeExporter commented 9 years ago
Sounds like a good starting point to me.

Original comment by daxenber...@gmail.com on 11 Mar 2015 at 12:42

GoogleCodeExporter commented 9 years ago
I think a good place to put such types would be the api.segmentation module.

Original comment by richard.eckart on 11 Mar 2015 at 12:44

GoogleCodeExporter commented 9 years ago
Hi DKPro Core people!

There is actually an ISO standard for transcriptions of speech now, based on 
TEI, resulting from the work presented in the link Richard posted above 
(http://www.tei-c.org/Activities/Council/Working/tcw25.xml) and interoperable 
with most of the common transcription formats and even some widely used 
transcription conventions. You can find more info at 
http://www1.ids-mannheim.de/prag/muendlichekorpora/isodin.html and 
http://www.exmaralda.org/en/tool/tei_drop/, and otherwise I'd try to answer any 
questions you might have...

Best regards,
Hanna Hedeland, HZSK/CLARIN-D

Original comment by hanna.he...@gmail.com on 11 Mar 2015 at 1:15

GoogleCodeExporter commented 9 years ago
I think the annotations "quoted speech" and "speaker" would be definitely 
handy. 

Two things that would additionally suit my purposes:
1) storing the quoted speech utterances in some sort of structured way, for 
example:
‘And after that,’ he continued, ‘do not be deceived by others.’
are two Utterances of one DirectSpeech of one Speaker.

2) storing the speaker as a probability vector - in modern literature it is 
less common to see: 
"There," said John
but rather things like:
"There!" John's finger pointed to Jack. 

Then I would save in the speaker something like [John, 0.9; Jack, 0.1] 
But perhaps the general case are annotated, known speakers, and the speaker 
prediction is just a special use-case.

Original comment by l.flek...@gmail.com on 11 Mar 2015 at 1:48

GoogleCodeExporter commented 9 years ago
@l.flekova 

@1) isn't it one utterance that is interrupted by a piece of text? I mean, I 
suppose that "he" didn't actually make a significant pause while saying that. 
There has been some contemplating on whether it might be useful/necessary to be 
able to model discontinous utterances (or in this case quoted speech spans).

@2) We do not have a concept of probabilities in any DKPro Core types yet. 
Right now, we assume that there is one truth. I think it would fall to a 
particular experiment setup to sub-class the DKPro Core type adding 
probabilities as needed for the individual setup.

Original comment by richard.eckart on 11 Mar 2015 at 3:07

GoogleCodeExporter commented 9 years ago
Hi Richard, 

@2) Makes sense, no objections to that.

@1) In principle you are right. I'll follow up if I can think of a 
counter-example where one would need to model it.

Original comment by l.flek...@gmail.com on 11 Mar 2015 at 4:22