comphist / cora

A web-based, token-level annotation tool for non-standard language data
http://www.linguistics.rub.de/comphist/resources/cora/
MIT License
10 stars 6 forks source link

Empty comments are created #77

Closed mbollmann closed 7 years ago

mbollmann commented 8 years ago

Originally reported by: fab-bar (Bitbucket: fab-bar, GitHub: fab-bar)


CorA somehow introduced empty comments of the following kind into the text:

#!xml
  <comment type=""></comment>

In the affected documents this seems to appear always and only after token-level comments. However, I was not able to replicate the creation of such empty comments. Maybe it was an old version of CorA that introduced these?

Here is a short example:

Imported version:

#!xml
<token id="t13" trans="aver">
        <dipl id="d14" trans="aver" utf="aver"/>
        <mod id="t13_m1" trans="aver" ascii="aver" utf="aver"></mod>
</token>
<token id="t14" trans="mochte">
        <dipl id="d15" trans="mochte" utf="mochte"/>
        <mod id="t14_m1" trans="mochte" ascii="mochte" utf="mochte"></mod>
</token>

Exported after annotation:

#!xml
  <token id="t12" trans="aver">
    <dipl id="t12_d1" trans="aver" utf="aver"/>
    <mod id="t12_m1" trans="aver" utf="aver" ascii="aver" checked="y">
      <pos tag="AVKO"/>
      <lemma tag="&#x14D;ver4]"/>
      <comment tag="Kleinwort vermutlich noch korrigieren!"/>
    </mod>
  </token>
  <comment type=""></comment>
  <token id="t13" trans="mochte">
    <dipl id="t13_d1" trans="mochte" utf="mochte"/>
    <mod id="t13_m1" trans="mochte" utf="mochte" ascii="mochte" checked="y">
      <pos tag="VMFIN.Prpr.3.Sg.Past.*"/>
      <lemma tag="m&#x22B;gen"/>
    </mod>
  </token>

mbollmann commented 7 years ago

Original comment by Marcel Bollmann (Bitbucket: mbollmann, GitHub: mbollmann):


Results in invalid XML, which is problematic since we're using validation now. See #87 for reproducing.

mbollmann commented 7 years ago

Original comment by Marcel Bollmann (Bitbucket: mbollmann, GitHub: mbollmann):


Issue #87 was marked as a duplicate of this issue.