elexis-eu / tei2ontolex

TEI to OntoLex Conversion
Apache License 2.0
6 stars 2 forks source link

dictScrap #17

Open kernc opened 3 years ago

kernc commented 3 years ago

Say I have a dictionary like:

<TEI xmlns:m="http://elex.is/wp1/teiLex0Mapper/meta" xmlns:a="http://elex.is/wp1/teiLex0Mapper/legacyAttributes" xmlns="http://www.tei-c.org/ns/1.0">
  <teiHeader>
    <fileDesc>
      <titleStmt>
        <title>hebrew_syns-wordnet</title>
      </titleStmt>
      <extent></extent>
      <publicationStmt>
        <publisher>Wordnet</publisher>
        <availability>
          <licence></licence>
        </availability>
        <date when=""></date>
        <idno></idno>
      </publicationStmt>
      <sourceDesc>
        <p></p>
      </sourceDesc>
    </fileDesc>
  </teiHeader>
  <text>
    <body>
      <entry m:e="synonym" a:num="1214" a:ev="u" a:idmorpho="0" a:stat="assigned" a:tgr="tg1" a:modify="2005-06-22" xml:lang="" type="null" xml:id="entry_1">
        <form type="lemma">
          <orth m:e="lemma">!מַשֶׁהוּ</orth>
        </form>
        <dictScrap>
          <seg m:e="dictinfo" a:bidict="melingo" a:bisense="1" a:monodict="rav-milim"/>
          <cit type="translationEquivalent"><quote m:e="teqs" xml:lang="eng">something</quote></cit>
          <seg m:e="comment">#ASSIGN:50=[gnd=39.0,dfl=11.0] #MZ@#IE too general</seg>
          <seg m:e="history">eyal-22 Jun 2005, assign-6 Nov 2003</seg>
        </dictScrap>
        <gramGrp>
          <gram type="pos">n</gram>
        </gramGrp>
      </entry>
      <entry m:e="synonym" a:num="8653" a:ev="yy" a:idmorpho="95136" a:stat="checked" a:modify="2005-06-22" xml:lang="" type="null" xml:id="entry_2">
        <form type="lemma">
          <orth m:e="lemma">יֵשׁוּת</orth>
        </form>
        <dictScrap>
          <seg m:e="dictinfo" a:bidict="melingo" a:bisense="1.1" a:monodict="rav-milim" a:monosense="1.1"/>
          <cit type="translationEquivalent"><quote m:e="teqs" xml:lang="eng">entity</quote></cit><seg m:e="comment">#IE</seg>
          <seg m:e="history">eyal-22 Jun 2005</seg>
        </dictScrap>
        <gramGrp>
          <gram type="pos">n</gram>
        </gramGrp>
      </entry>
    </body>
  </text>
</TEI>

After the transformation, the resulting XML (note, PR #16 applied):

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://lari-datasets.ilc.cnr.it/nenu_sample#" xmlns:void="http://rdfs.org/ns/void#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:ns="http://creativecommons.org/ns#" xmlns:lime="http://www.w3.org/ns/lemon/lime#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:lexinfo="http://www.lexinfo.net/ontology/3.0/lexinfo#" xmlns:lexicog="http://www.w3.org/ns/lemon/lexicog#" xmlns:dct="http://purl.org/dc/terms/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:bibo="http://purl.org/ontology/bibo/" xmlns:ontolex="http://www.w3.org/ns/lemon/ontolex#" xmlns:vann="http://purl.org/vocab/vann/" xmlns:tei="http://www.tei-c.org/ns/1.0" xmlns:skos="http://www.w3.org/2004/02/skos/core#">
  <lime:Lexicon>
    <dc:title>hebrew_syns-wordnet</dc:title>
    <dc:publisher>Wordnet</dc:publisher>
    <lime:entry>
      <ontolex:LexicalEntry rdf:ID="entry_1">
        <ontolex:canonicalForm>
          <rdf:Description>
            <ontolex:writtenRep xml:lang="">!מַשֶׁהוּ</ontolex:writtenRep>
          </rdf:Description>
        </ontolex:canonicalForm>
        <dictScrap xmlns="http://www.tei-c.org/ns/1.0">
          <seg xmlns:m="http://elex.is/wp1/teiLex0Mapper/meta" xmlns:a="http://elex.is/wp1/teiLex0Mapper/legacyAttributes" m:e="dictinfo" a:bidict="melingo" a:bisense="1" a:monodict="rav-milim"/>
          <lexinfo:senseTranslation xmlns="http://lari-datasets.ilc.cnr.it/nenu_sample#" xml:lang="eng">something</lexinfo:senseTranslation>
          <seg xmlns:m="http://elex.is/wp1/teiLex0Mapper/meta" m:e="comment">#ASSIGN:50=[gnd=39.0,dfl=11.0] #MZ@#IE too general</seg>
          <seg xmlns:m="http://elex.is/wp1/teiLex0Mapper/meta" m:e="history">eyal-22 Jun 2005, assign-6 Nov 2003</seg>
        </dictScrap>
      </ontolex:LexicalEntry>
    </lime:entry>
    <lime:entry>
      <ontolex:LexicalEntry rdf:ID="entry_2">
        <ontolex:canonicalForm>
          <rdf:Description>
            <ontolex:writtenRep xml:lang="">יֵשׁוּת</ontolex:writtenRep>
          </rdf:Description>
        </ontolex:canonicalForm>
        <dictScrap xmlns="http://www.tei-c.org/ns/1.0">
          <seg xmlns:m="http://elex.is/wp1/teiLex0Mapper/meta" xmlns:a="http://elex.is/wp1/teiLex0Mapper/legacyAttributes" m:e="dictinfo" a:bidict="melingo" a:bisense="1.1" a:monodict="rav-milim" a:monosense="1.1"/>
          <lexinfo:senseTranslation xmlns="http://lari-datasets.ilc.cnr.it/nenu_sample#" xml:lang="eng">entity</lexinfo:senseTranslation>
          <seg xmlns:m="http://elex.is/wp1/teiLex0Mapper/meta" m:e="comment">#IE</seg>
          <seg xmlns:m="http://elex.is/wp1/teiLex0Mapper/meta" m:e="history">eyal-22 Jun 2005</seg>
        </dictScrap>
      </ontolex:LexicalEntry>
    </lime:entry>
  </lime:Lexicon>
</rdf:RDF>

is not valid RDF due to "Multiple children of property element" inside the dictScrap elements.

What would be a good way to deal with that?

laurentromary commented 3 years ago

Hi kernc. It's Laurent here. Keeping to the spirit within which we set up TEI Lex 0, I would suggest to first make the content TEI Lex 0 compliant and then do the transform. The source data is a little weird (under encoded from a TEI point of view + quite a couple of additional hacks). @ttasovac what do you think?