clarin-eric / ParlaMint

ParlaMint: Comparable Parliamentary Corpora
https://clarin-eric.github.io/ParlaMint/
41 stars 52 forks source link

HU: empty notes #576

Closed matyaskopp closed 1 year ago

matyaskopp commented 1 year ago
find . -name "ParlaMint-HU_*"|xargs grep -n '<note/>'
./2014/ParlaMint-HU_2014-09-26.xml:118:               <note/>
./2017/ParlaMint-HU_2017-04-10.xml:236:               <note/>
./2017/ParlaMint-HU_2017-05-03.xml:1492:               <note/>
./2017/ParlaMint-HU_2017-11-06.xml:1857:               <note/>
./2017/ParlaMint-HU_2017-04-03.xml:532:               <note/>
./2017/ParlaMint-HU_2017-04-03.xml:785:               <note/>
./2017/ParlaMint-HU_2017-04-03.xml:838:               <note/>
./2017/ParlaMint-HU_2017-04-03.xml:1523:               <note/>
./2016/ParlaMint-HU_2016-11-11.xml:303:               <note/>
./2016/ParlaMint-HU_2016-09-12.xml:1284:               <note/>
./2016/ParlaMint-HU_2016-12-12.xml:294:               <note/>
./2016/ParlaMint-HU_2016-12-12.xml:899:               <note/>
./2016/ParlaMint-HU_2016-11-29.xml:1755:               <note/>
./2016/ParlaMint-HU_2016-11-29.xml:1900:               <note/>
./2016/ParlaMint-HU_2016-11-29.xml:1911:               <note/>
./2016/ParlaMint-HU_2016-11-29.xml:1921:               <note/>
./2016/ParlaMint-HU_2016-11-29.xml:3164:               <note/>
./2016/ParlaMint-HU_2016-12-05.xml:1949:               <note/>
./2016/ParlaMint-HU_2016-11-14.xml:179:               <note/>
./2016/ParlaMint-HU_2016-11-08.xml:422:               <note/>
./2016/ParlaMint-HU_2016-04-12.xml:1208:               <note/>
./2016/ParlaMint-HU_2016-06-07.xml:1039:               <note/>
./2016/ParlaMint-HU_2016-03-01.xml:3426:               <note/>
./2015/ParlaMint-HU_2015-10-05.xml:1519:               <note/>
./2015/ParlaMint-HU_2015-03-23.xml:551:               <note/>
./2015/ParlaMint-HU_2015-02-16.xml:339:               <note/>
./2015/ParlaMint-HU_2015-11-17.xml:1356:               <note/>
./2015/ParlaMint-HU_2015-10-20.xml:440:               <note/>
./2015/ParlaMint-HU_2015-03-30.xml:930:               <note/>
./2015/ParlaMint-HU_2015-11-16.xml:1720:               <note/>
./2015/ParlaMint-HU_2015-02-23.xml:374:               <note/>
./2015/ParlaMint-HU_2015-11-30.xml:289:               <note/>
./2015/ParlaMint-HU_2015-11-30.xml:3083:               <note/>
./2018/ParlaMint-HU_2018-05-08.xml:458:               <note/>

@TomazErjavec, this should not be allowed in the schema

TomazErjavec commented 1 year ago

@TomazErjavec, this should not be allowed in the schema

I guess not, but it is very difficult to prevent, as we allow the time element in notes. And, if you have time, you might not have text. And, if you have time, you also cannot have our standard normalized-space.string, as that one does not allow leading or trailing blanks, but you will need them when combining time with text. Addtitionally, we also alow pb in note. So the content model would become really really complicated... But if you have a suggestion how to make it, let me know.

TomazErjavec commented 1 year ago

Fixed, closing.