TEIC / TEI

The Text Encoding Initiative Guidelines
https://www.tei-c.org
Other
275 stars 88 forks source link

FR: TEI Features for CMC #1955

Closed luengen closed 2 months ago

luengen commented 4 years ago

The TEI lacks models for encoding features of Computer-Mediated Communication (CMC). Since 2013, an international community of CMC researchers within the TEI SIG CMC have developed several TEI customisations for the encoding of CMC data of different genres (chat and whatsapp logfiles, threads in discussion forums usenet and on wikipedia talk pages, sequences of tweets, etc.) for the integration of CMC data into corpus infrastructures. These customisations (the ‘DeRiK schema’ 2012, the ‘CoMeRe schema’ 2014 and the ‘CLARIN-D schema’ 2016) have been used in different corpus projects; the experiences made with using these schemas have been documented and discussed in the work of the SIG, in TEI- and CLARIN-related events and on conferences and workshops dedidcated to the analysis of computer-mediated communication and to building and annotating CMC corpora. The customisations and resources have been made available via the wiki space of the CMC-SIG in the TEIWiki.

In 2019, we have distilled a „reduce to the max“ customisation from the previous customisations and the experiences made with these. We dubbed that customisation CMC-core and hererby submit it as a Feature Request to the TEI Council and community. It contains, in our view, the minimum extensions to the TEI needed for the encoding of textual data from CMC genres.

CMC-core introduces in a nutshell

  1. The post element as the basic unit of CMC encoding. It is defined to be a member of model.divPart.cmc;
  2. the model model.divPart.cmc which allows to use and combine occurrences of <post>, <u>, <kinesic>, <incident> and further elements within one and the same <div>;
  3. the attributes @mode, @replyTo, and @indentLevel for <post>
  4. the optional global attribute @creation which may indicate for any TEI element how its content was created in a CMC environment, i.e. directly by a human user, by the system, via a template, or other

CMC-core encoding example: Discussion thread on a Wikipedia talk page (Astonomical object)

<div type="thread">
   <head>Naturally occurring?</head>
   <post mode="written" xml:id="p4" indentLevel="0" who="#u005" synch="#t005">
      <p>I'm not sure that this is a proper criterium, or even what this means. What if we set an explosion that breaks a comet into two pieces? What if we build a moon? Cheers, <signed creation="template"><ref target="/wiki/User:Greenodd">Greenodd</ref> (<ref target="/wiki/User_talk:Greenodd">talk</ref>) <time>01:00, 21 July 2011 (UTC)</time></signed> </p>
   </post>
   <post mode="written" xml:id="p5" indentLevel="1" replyTo="#p4" who="#u006" synch="#t006">
      <p>Those haven't happened. If they do, we can revisit the concern. <signed creation="template"><ref target="/wiki/User:Praemonitus" >Praemonitus</ref> (<ref target="/wiki/User_talk:Praemonitus" >talk</ref>) <time>01:15, 1 April 2015 (UTC)</time></signed>  </p>
   </post>
</div>

Detailed documentation on CMC-core

The CMC-core customisation (ODD and sample annotations of CMC corpus files from our projects) can be found at the TEI SIG CMC Wiki.

The rationale of CMC-core is described in detail in an accepted paper for a special edition of the Corpus journal. We kindly ask you to consult this paper for a detailed rationale for the suggestion of the abovementioned additions and modifications to the TEI. We ask the council to consider the inclusion of the abovementioned additions and modifications as official features of the TEI encoding framework.

Wishing you all a happy new year 2020 and looking forward to your comments and replies!

Michael Beißwenger Harald Lüngen (on behalf of the TEI SIG CMC)

bansp commented 4 years ago

This should be extremely interesting for the CLARIN community, given the overall interest in CMC nowadays. Is there a chance for a customization of CMC TEI to be served from Roma as one of the presets, perhaps?

ebeshero commented 4 years ago

VF2F2020: Council decides those assigned to this ticket will meet to review the proposal carefully, see how it best fits into the Guidelines—perhaps as a new chapter—and return to the working group with a proposal for how to proceed.

luengen commented 4 years ago

Dear Elisa, dear colleagues, many thanks for this. Please note that we are available for more exchange and discussion, also for instance in a zoom meeting, if desired. Best - Harald (with Michael Beißwenger, for the TEI SIG CMC)

peterstadler commented 3 years ago

Just an update that a sub group (@luengen , @beisswenger, @lujessica, @sydb, and @peterstadler) is working on this for already some months. Development is based on https://github.com/TEI-CMC-SIG/cmc-core and we now shifted towards a fork of the Guidelines at https://github.com/TEI-CMC-SIG/TEI/tree/cmc-features (branch 'cmc-features').

The HTML version—for facilitating review—is built at the TEI's Jenkins server: https://jenkins.tei-c.org/job/TEIP5-CMC-features/lastSuccessfulBuild/artifact/P5/release/doc/tei-p5-doc/en/html/CMC.html

sabineseifert commented 4 months ago

Related PR #2537