LanguageMachines / libfolia

FoLiA library for C++
https://proycon.github.io/folia
GNU General Public License v3.0
15 stars 7 forks source link

extracting text() from <part> nodes ignores the space="no" attribute #47

Closed kosloot closed 3 years ago

kosloot commented 3 years ago

Given this FoLiA:

<?xml version="1.0" encoding="UTF-8"?>
<FoLiA xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://ilk.uvt.nl/folia" xml:id="hbr" generator="libfolia-v2.8" version="2.4.0">
  <metadata type="native">
    <annotations>
      <paragraph-annotation/>
      <text-annotation set="https://raw.githubusercontent.com/proycon/folia/master/setdefinitions/text.foliaset.ttl"/>
      <part-annotation/>
      <hyphenation-annotation/>
    </annotations>
  </metadata>
  <text xml:id="hbr.text">
    <p xml:id="hbr.text.p">
      <part xml:id="hbr.text.part.1" space="no">
        <t>White<t-hbr/>water Moun<t-hbr/></t>
      </part>
      <part xml:id="hbr.text.part.2">
        <t>tains.</t>
      </part>
    </p>
  </text>
</FoLiA>

the Pyton function folia2txt (rightfully) extracts the text: Whitewater Mountains.

But it's C++ counterpart FoLiA-2text extracts: Whitewater Moun tains. ignoring the space="no". This is most probably a bug in libfolia.

kosloot commented 3 years ago

seems fixed in libfolia now.