Closed matyaskopp closed 1 year ago
I am against new div types, esp. ones like "table-of-content" because ParlaMint is an instantiation of the more general Parla-CLARIN, which states that "While the will contain the transcription proper (i.e. the speeches), the [front] contains preamble text, and the [back] various appendices or texts that are related to the speeches." Furthermore, the div element there already makes recommendations on the suggested values of the type attribute, but all of them under the assumption that the div contains transcriptions.
So, for the case of #437 and under the assumption that the ToC should not be removed, as I wrote there, we have the following two options:
@type="toc"
) at the start of the div (ugly but simple)I can do 2. if the feeling about this is strong enough.
The ParlaClarin guideline states this
If used, the values of the type and subtype attributes will depend on the parliamentary rules of the particular country, on the need to distinguish the types of divisions, as well as on the ability to automatically recognise them or the available effort to manually add them.
<body>
<div> ...
<div type="representation">
<head>Representation of members of the Federal Government</head>
...
</div>
<div type="topical">
<head>Hour of topical interest</head>
...
</div>
<div type="request">
<head>Announcement of an urgent request</head>
...
</div>
</div>
</body>
In practice, @type=debateSection
seems to be used for most ParlaMint corpora. Our data has pre-debate announcements etc. generic notes, debates, post-debate announcements and sometimes annoucements and etc. generic notes between several debate sections. For us it would be easiest just to label those sections as they are, and try to use @type=debateSection
wherever there are people actually talking.
Is there a reason not the grant the same lenience here as ParlaClarin grants?
Agree that ToC shouldn't be in data because it is reconstructible from data (there is no additional information)
I was thinking more about different types of div. I have just reviewed ParlaMint-NO (@tungland), (do not comment on note/@type
it is already reported here #473 ) and it seems that new div types are needed because there are sections(div
)
containing utterances <u>
, but no debate is there (voting section with just chair speeches that moderates the situation):
https://github.com/tungland/ParlaMint/blob/1c385b6acce7a1459878c377a79dec4672af6d7d/Data/ParlaMint-NO/ParlaMint-NO_1999-03-02-lower.xml#L418
<div type="debateSection">
<note type="a">Etter at det var ringt til votering i fem minutt, sa</note>
<u who="#person.JR"
ana="#chair"
xml:id="ParlaMint-NO_1999-03-02-lower.ud164e417"
xml:lang="nob">
<seg xml:id="ParlaMint-NO_1999-03-02-lower.segd164e418">Odelstinget skal votere i sakene nr. 1-4.</seg>
</u>
<note type="tit">Votering i sak nr. 1</note>
and divs without utterances <u>
- I do not understand what is happening there, but there is definitely some structure
<div type="debateSection">
<note type="tit">Sak nr. 5</note>
<note type="tit">Referat</note>
<note type="merke">1.</note>
<note type="refnr">(61)</note>
<note type="a">Lov om endringar i lov av 24. juni 1994 nr. 39 om sjøfarten (sjøloven) (dispasjøreksamen) (Ot.prp. nr. 36 (1998-99))</note>
<note type="a">Enst.: Vert send justiskomiteen.</note>
<note type="merke">2.</note>
<note type="refnr">(62)</note>
<note type="a">Lov om endringer i lov av 1. mars 1985 nr. 3 om stortingsvalg, fylkestingsvalg og kommunestyrevalg (Valgloven) (Ot.prp. nr. 37 (1998-99))</note>
<note type="a">Enst.: Vert send kontroll- og konstitusjonskomiteen.</note>
</div>
<div type="debateSection">
<note type="slutt">Møtet slutt kl. 13.20.</note>
</div>
OK, what about then having a new type of div, <div type="notes">
which should contain only notes, possibly preceded by <head>
?
@matyaskopp What is happening in your second example: it is recorded as a debate section having been held, but no utterances recorded. Possibly no arguments were held, and they did not bother with the formalities from the speaker.
The context here is that this is based on an official transcript from Norway's "pseudo" lower house, Odelstinget. Norway abolished it's pseudo-bicameral system in 2009, but for many years before that, meetings in these bodies were becoming increasing ceremonial, just moving through empty formalities, with actual debate happening during joint sessions.
OK, what about then having a new type of div,
<div type="notes">
which should contain only notes, possibly preceded by<head>
?
ok, it is the best we can think of now. @TomazErjavec, probably this is a more consistent type value:
<div type="noteSection">...</div>
OK, did it. Good idea about "noteSection", but I then chose "commentSection", as we don't really have a guarantee that there will only be <note>
s inside, some heuristic might change them to <incident>
and similar, so I allow those too. The schema has not been tested much, I hope it works.
For the record, the Guidelines have been changed here and here, while the schema for div is now
https://github.com/clarin-eric/ParlaMint/blob/ad0a3a78ce8bda4cd6d5bd91bf60dcedf15e690a/Schema/ParlaMint-TEI.rng#L416-L454
Hm, I just notice now I forgot to remove tabs from the schema, sorry!
Hm, I just notice now I forgot to remove tabs from the schema, sorry!
Removed all tabs in 0f1e1bb (also in Scripts).
@TomazErjavec, I did want to close this issue because I think there is no need (at least for now) to add new div types.
Currently, we support:
debateSection
for div sections that contain at least one utterance u
commentSection
for only comment sectionsbut I noticed that schema does not enforce u
in debateSection
so I am leaving it open.
I noticed that schema does not enforce u in debateSection so I am leaving it open.
This should be solved in 8103e9e, docu branch.
I noticed that schema does not enforce u in debateSection so I am leaving it open.
This should be solved in 8103e9e, docu branch.
seems ok to me, merged to the data branch.
closing
This should be more like a hotfix for partners that have structured proceedings and don't want to remove any data. (I will extend this, creating this to be able to refer to it)