clarin-eric / ParlaMint

ParlaMint: Comparable Parliamentary Corpora
https://clarin-eric.github.io/ParlaMint/
41 stars 52 forks source link

Failing Parla-CLARIN schema validation #822

Closed matyaskopp closed 7 months ago

matyaskopp commented 11 months ago

not sure what happened, only countries with CHES fail

TomazErjavec commented 11 months ago

Was dev merged into data / main? Because there were changes to the schema to accommodate CHES in devel.

matyaskopp commented 11 months ago

Was dev merged into data / main? Because there were changes to the schema to accommodate CHES in devel.

yes, devel is both in the main and data branches: https://github.com/clarin-eric/ParlaMint/commits/data

matyaskopp commented 11 months ago

Mixture of state and label is not allowed in TEI: (https://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-state.html)

<content>
    <sequence>
        <elementRef key="precision" minOccurs="0" maxOccurs="unbounded"/>
        <alternate>
            <elementRef key="state" minOccurs="1" maxOccurs="unbounded"/>
            <sequence>
                <classRef key="model.headLike" minOccurs="0" maxOccurs="unbounded"/>
                <classRef key="model.pLike" minOccurs="1" maxOccurs="unbounded"/>
                <alternate minOccurs="0" maxOccurs="unbounded">
                    <classRef key="model.noteLike"/>
                    <classRef key="model.biblLike"/>
                </alternate>
            </sequence>
            <alternate minOccurs="0" maxOccurs="unbounded">
                <classRef key="model.labelLike"/>
                <classRef key="model.noteLike"/>
                <classRef key="model.biblLike"/>
            </alternate>
        </alternate>
    </sequence>
</content>
TomazErjavec commented 11 months ago

Drat, missed that one. It doesn't show up in the build, as I only validate against ParlaMint RNG schemas. The worst is that we could manage without label, as it only encoded the CHES name of the party.

I can't fix this now, as it would mean recompiling all the corpora from scratch (CHES/Wiki/ministers get inserted firs, then all the rest of the build). I guess this means we will have to live with failing samples and non-TEI conformant corpora for 4.0. Sorry about this.

matyaskopp commented 7 months ago

org//state/label is still in relaxNG schema: https://github.com/clarin-eric/ParlaMint/blob/f7170e567d94acb50c0ccd9956d8ad9a7a1c1524/Schema/ParlaMint.rng#L200-L204

TomazErjavec commented 7 months ago

OK, this has now been fixed, as evidenced by new conversion log files. I also removed all Samples (as they are failing with the new schema) and started adding new samples as they become available. So, closing.