Exmaralda-Org / exmaralda

26 stars 15 forks source link

Revise COMA schema #457

Open berndmoos opened 3 months ago

berndmoos commented 3 months ago

There's three things:

  1. COMA writes files that do not conform to the schema. This (c|sh)ould be fixed
  2. A COMA file as the central element of the ZuMult COMACorpus backend could use some additional (optional) information, such as
    • which transcript belongs to which recording?
    • what is the file type of a transcription file (e.g. EXB vs. ISO/TEI)
      1. The very big thing: changes, additions for a better, stronger COMA, such as
    • Metadata for Speakers in Communications
    • Multilingual metadata
    • Other transcription formats, e.g. FLK, EAF, ISO/TEI

The second point is relevant for TGDP.

berndmoos commented 3 months ago

see also: https://github.com/zumult-org/zumultapi/issues/197

berndmoos commented 3 months ago

@Herrner (who knoweth things): if I introduce an optional attribute in the schema, COMA will read it and write it, it just won't be visible to the user. Right?

Example:

<Transcription theNewAttribute="theNewValue">

Herrner commented 3 months ago

I guess so, but let me have a look first...

Herrner commented 3 months ago

When looking at the code, I'm wondering whether you would have to add anything to the Schema at all, but better do so. I hardly remember anything, but COMA builds the DOM with a SaxBuilder and writes it using JDOM. Very confusing, all in all.

Herrner commented 3 months ago

I just added a new attribute to a transcription, changed stuff with the transcription in COMA and wrote the file back, and the attribute survived, so...

berndmoos commented 3 months ago

I need the schema because I want other people to write COMA XML outside COMA. I will add surviving optional attributes then for a start. Using a SAXBuilder for reading does not mean that no (J)DOM comes out of it afaik.

I will also put the schema inside GIT (it is not there, or is it?).

sarkipo commented 2 months ago

The very big thing: changes, additions for a better, stronger COMA, such as

  • Metadata for Speakers in Communications
  • Multilingual metadata
  • Other transcription formats, e.g. FLK, EAF, ISO/TEI

All of these (3.) are very much relevant for INEL.

berndmoos commented 2 months ago

All of these (3.) are very much relevant for INEL.

Duly noted. But this issue needs a premium sponsor - I am sort of working on it.