clarin-eric / ParlaMint

ParlaMint: Comparable Parliamentary Corpora
https://clarin-eric.github.io/ParlaMint/
41 stars 52 forks source link

To strict warning for desc content? #695

Closed TomazErjavec closed 1 year ago

TomazErjavec commented 1 year ago

In processing SI I am getting a huge number of warnings like:

WARN ParlaMint-SI_2000-11-16-SDZ3-Izredna-01.ana: gap/desc in ParlaMint-SI_2000-11-16-SDZ3-Izredna-01.ana.seg45.2 has strange content "..."
WARN ParlaMint-SI_2000-11-16-SDZ3-Izredna-01.ana: gap/desc in ParlaMint-SI_2000-11-16-SDZ3-Izredna-01.ana.seg46 has strange content "..."
WARN ParlaMint-SI_2000-11-16-SDZ3-Izredna-01.ana: gap/desc in ParlaMint-SI_2000-11-16-SDZ3-Izredna-01.ana.seg47.1 has strange content "..."

The reason is this check: https://github.com/clarin-eric/ParlaMint/blob/27fbbdd7ca33fafd7ee3bbe310d2c9bd5440d020/Scripts/parlamint2release.xsl#L373

But in SI "..." is a valid content of the comment that indicates that there is something missing, so the check is too strict. @matyaskopp, could you make it less strict (in devel)?

matyaskopp commented 1 year ago

Currently, I allowed these contents: https://github.com/clarin-eric/ParlaMint/blob/2eeb4e33918cc2d8fe2e8902d9fbcd1144a3f70b/Scripts/parlamint2release.xsl#L78-L79

list of strange notes with frequencies (without 2eeb4e3 change):

  25210 ...
  13046 …
   9248 ,
   3098 .
    957 ..
    903 –
    752 ....
    728 !
    426 ?
    231 :
    199 )
    174 §
    122 -
     69 *
     64 -----
     50 (
     36 »
     36 §§
     21 ).
     17 ;
     12 .....
     10 ​
     10 .»
      9 ".
      9 "
      8 ….
      7 ."
      7 (…):
      5 �
      5 «
      5 ?!
      5 /
      5 .�
      5 -------------------------------
      4 »:
      4 _____
      4 **
      4 %
      3 •
      3 ».
      3 ·
      3 ),
      3 (…):(…)
      3 (…)!
      2 ×
      2 ´
      2 ______
      2 ]
      2 ))
      1 ▪
      1 ⅐
      1 …»
      1 …..
      1 …"
      1 “
      1 ————
      1 ——
      1 –––––
      1 »�
      1 ____
      1 ??
      1 ......
      1 .)
      1 .(..)
      1 .".
      1 ---—
      1 -------------------------------------
      1 ------------------------------
      1 ------------
      1 ----------
      1 ------
      1 ----
      1 ---
      1 +
      1 *)
      1 )?
      1 ):
      1 ).�
      1 (…)(….).
      1 (§§
      1 %?
      1 !?