clarin-eric / ParlaMint

ParlaMint: Comparable Parliamentary Corpora
https://clarin-eric.github.io/ParlaMint/
50 stars 53 forks source link

RS: Croatian parliamentary corpus - teiCorpus title #635

Closed matyaskopp closed 1 year ago

matyaskopp commented 1 year ago

in both TEI and TEI.ana versions https://github.com/clarin-eric/ParlaMint/blob/937681cf012cc7330025a8052c60cb05b1bc25ae/Data/ParlaMint-RS/ParlaMint-RS.xml#L7

<title type="main" xml:lang="en">Croatian parliamentary corpus ParlaMint-RS [ParlaMint SAMPLE]</title>
5roop commented 1 year ago

Good eye, Matyáš, thanks for reporting.

Many minor corrections have been performed after I submitted the corpora, and the easiest way to fix this would be to edit the latest version, which I do not have.

@TomazErjavec, can you share the location of the latest corpus, I can sed and correct it. In the mean time I will fix the sample.

TomazErjavec commented 1 year ago

Indeed, well spotted @matyaskopp ! @5roop, your files are on new-tantra /project/corpora/Parla/ParlaMint/V3/Data Looking forward to the corrected version.

5roop commented 1 year ago

@TomazErjavec, @matyaskopp, this is now taken care of.

The new version is here: new-tantra:/home/rupnik/parlamint2/ParlaMint_fixing_635 . It seems the only bad files were the root TEI documents, both plain-text and ana.

TomazErjavec commented 1 year ago

It seems the only bad files were the root TEI documents, both plain-text and ana.

If this is so, no need for me to take the complete new version - I just corrected titles in the two files, and will re-run.