Open matyaskopp opened 10 months ago
The source of transcriptions of PT debates does not seem to contain paragraphs, but in the corpus, it is somehow segmented into paragraphs (my guess is if the punctuation ./?/ is at the end of the line then paragraph<seg> ends)
.
?
<seg>
https://debates.parlamento.pt/catalogo/r3/dar/01/13/04/035/2019-01-04?sft=true#p5 "paragraphs" are framed:
The TEI:
<seg xml:id="ParlaMint-PT_2019-01-04.seg21">Em primeiro <!-- --> privada. A segurança <!-- --> complementar.</seg>
The TEI.ana:
<seg xml:id="ParlaMint-PT_2019-01-04.seg21"> <s xml:id="ParlaMint-PT_2019-01-04.seg21.s"> <w xml:id="ParlaMint-PT_2019-01-04.seg21.s.1" msd="UPosTag=ADP" lemma="em">Em</w> <w xml:id="ParlaMint-PT_2019-01-04.seg21.s.2" msd="UPosTag=ADJ|Gender=Masc|Number=Sing" lemma="primeiro">primeiro</w> <!-- --> <w xml:id="ParlaMint-PT_2019-01-04.seg21.s.14" msd="UPosTag=ADJ|Gender=Fem|Number=Sing" lemma="privar,privado" join="right">privada</w> <pc xml:id="ParlaMint-PT_2019-01-04.seg21.s.15" msd="UPosTag=PUNCT">.</pc> <w xml:id="ParlaMint-PT_2019-01-04.seg21.s.16" msd="UPosTag=DET|Gender=Fem|Number=Sing" lemma="a">A</w> <w xml:id="ParlaMint-PT_2019-01-04.seg21.s.17" msd="UPosTag=NOUN|Gender=Fem|Number=Sing" lemma="segurança">segurança</w> <!-- --> <w xml:id="ParlaMint-PT_2019-01-04.seg21.s.47" msd="UPosTag=ADJ|Gender=Fem|Number=Sing" lemma="complementar" join="right">complementar</w> <pc xml:id="ParlaMint-PT_2019-01-04.seg21.s.48" msd="UPosTag=PUNCT">.</pc> <linkGrp targFunc="head argument" type="UD-SYN"><!-- --> </linkGrp> </s> </seg>
The source of transcriptions of PT debates does not seem to contain paragraphs, but in the corpus, it is somehow segmented into paragraphs (my guess is if the punctuation
.
/?
/ is at the end of the line then paragraph<seg>
ends)https://debates.parlamento.pt/catalogo/r3/dar/01/13/04/035/2019-01-04?sft=true#p5 "paragraphs" are framed:
The TEI:
The TEI.ana: