clarin-eric / ParlaMint

ParlaMint: Comparable Parliamentary Corpora
https://clarin-eric.github.io/ParlaMint/
43 stars 53 forks source link

Missing samples of submitted corpora #532

Closed TomazErjavec closed 1 year ago

TomazErjavec commented 1 year ago

My understanding is that for submitted full corpora the samples should also be present - at least - in the data branch (ideally also in main). However, DK has no samples?

TomazErjavec commented 1 year ago

Actually, I jus saw the many pull requests, so it might be that DK has simply not been merged yet. In this case, pls. close!

matyaskopp commented 1 year ago

yes, because

Sample:

<?xml version="1.0" encoding="UTF-8"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0" xml:lang="da" xml:id="ParlaMint-DK_2020-12-22-20201-M43" ana="#parla.sitting #covid">
   <teiHeader>
      <fileDesc>
         <titleStmt>
            <title type="main" xml:lang="en">The Danish parliamentary corpus ParlaMint-DK, Session 20201, Sitting M43 [ParlaMint]</title>
            <title type="sub" xml:lang="en">Hansard of the session of the Danish Parliament (Folketinget), 20201, M43 (2020-12-22), preliminary version</title>
            <meeting ana="#parla.session">20201</meeting>
            <meeting ana="#parla.sitting">M43</meeting> <!-- should be meeting -->

Delivered corpus:

<?xml version="1.0" encoding="UTF-8"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0" xml:lang="da" xml:id="ParlaMint-DK_2020-12-22-20201-M43" ana="#parla.sitting #covid">
   <teiHeader>
      <fileDesc>
         <titleStmt>
            <title type="main" xml:lang="en">The Danish parliamentary corpus ParlaMint-DK, Session 20201, Sitting M43 [ParlaMint]</title>
            <title type="main" xml:lang="da">Det danske korpus ParlaMint-DK, folketingsåret 20201, møde M43 [ParlaMint]</title>
            <title type="sub" xml:lang="en">Hansard of the session of the Danish Parliament (Folketinget), 20201, M43 (2020-12-22), preliminary version</title>
            <title type="sub" xml:lang="da">Referat fra folketingssalen, folketingsåret 20201, møde M43 (2020-12-22), foreløbig version</title>
            <meeting ana="#parla.session">20201</meeting>
            <meeting ana="#parla.meeting">M43</meeting>
            <!-- missing #parla.sitting specification (the file contains sitting) -->