elexis-eu / tei2ontolex

TEI to OntoLex Conversion
Apache License 2.0
6 stars 2 forks source link

Dealing with embedded entries #14

Open laurentromary opened 4 years ago

laurentromary commented 4 years ago

It is common practice to embed entries within entries, for instance when compound are described in the entry of the main head. The constructs are (re for related entry):

<entry>
   <re>...</re>
</entry>

or

<entry>
   <entry>...</re>
</entry>

Is there one or several (for different types of relations between the main and sub) for this?

laurentromary commented 4 years ago

Here's an example:

 <entry xml:id="aboi" n="1906-001_unknown">
   <form type="lemma"><orth>ABOI</orth></form>
   <gramGrp>
      <pos expand="nom">n.</pos>
      <gen expand="masculin">m.</gen>
   </gramGrp>
   <etym><pc>(</pc>de <mentioned>aboyer</mentioned><pc>)</pc><pc>.</pc></etym>
   <sense>
      <def>Cri du chien</def><pc>.</pc>
   </sense>
   <re type="gram">
      <gramGrp><pos expand="nom">N.</pos>
            <gen expand="masculin">m.</gen>
            <number expand="pluriel">pl.</number></gramGrp>
       <sense><def>Dernières extrémités où le cerf est réduit</def><pc>.</pc></sense>
       <sense><usg type="style" rend="italic" expand="figuré">Fig.</usg>
            <def>Situation désespérée</def>
           <pc>:</pc>
            <cit type="example">
              <quote>commerçant ruiné et aux abois</quote>
            </cit><pc>.</pc>
      </sense>
   </re>
</entry>
jmccrae commented 4 years ago

This may be quite tricky as you would need to create a lexicog:Entry and relate it to the ontolex:LexicalEntry as described in https://www.w3.org/2019/09/lexicog/

BTW the <re> tag does not seem to be part of the spec

laurentromary commented 4 years ago

Re: <re> I am trying to keep some alignment with mainstream TEI. If we know how to do one we could as well implement the other.

laurentromary commented 4 years ago

Can you outline a snippet in XML on how you would do this?

laurentromary commented 4 years ago

The following should be a TEI Lex 0 compatible structure that you could add to the test samples:

<entry xml:id="aboi" n="1906-001_unknown">
   <form type="lemma"><orth>ABOI</orth></form>
   <gramGrp>
      <pos expand="nom">n.</pos>
      <gen expand="masculin">m.</gen>
   </gramGrp>
   <etym><pc>(</pc>de <mentioned>aboyer</mentioned><pc>)</pc><pc>.</pc></etym>
   <sense>
      <def>Cri du chien</def><pc>.</pc>
   </sense>
   <entry type="gram">
      <gramGrp><pos expand="nom">N.</pos>
            <gen expand="masculin">m.</gen>
            <number expand="pluriel">pl.</number></gramGrp>
       <sense><def>Dernières extrémités où le cerf est réduit</def><pc>.</pc></sense>
       <sense><usg type="style" rend="italic" expand="figuré">Fig.</usg>
            <def>Situation désespérée</def>
           <pc>:</pc>
            <cit type="example">
              <quote>commerçant ruiné et aux abois</quote>
            </cit><pc>.</pc>
      </sense>
   </entry>
</entry>
jmccrae commented 4 years ago

So you need something like this

<lexicog:Entry>
  <lexicog:describes>
     <ontolex:LexicalEntry rdf:ID="aboi">
        <!-- As before -->
     </ontolex:LexicalEntry>
  </lexicog:describes>
  <lexicog:subComponent>
    <lexicog:Entry>
      <lexicog:describes>
        <ontolex:LexicalEntry>
            <!-- Subentry goes here --> 
        </ontolex:LexicalEntry>
      </lexicog:describes>
    </lexicog:Entry>
  </lexicog:subComponent>
</lexicog:Entry>

I think the issue is we want to generate all this structure only when there are subentries.

laurentromary commented 4 years ago

Quick question: can we have this without the subcomponents, i.e.:

<lexicog:Entry>
  <lexicog:describes>
     <ontolex:LexicalEntry rdf:ID="aboi">
        <!-- As before -->
     </ontolex:LexicalEntry>
  </lexicog:describes>
</lexicog:Entry>

Or in this case, the lexicog:Entry should be dropped. I am asking to see whether I need to test the existence of subentries.

laurentromary commented 4 years ago

I have implemented this with a test. Could you have a look at: https://github.com/elexis-eu/tei2ontolex/blob/master/GeneratedOntolex/PLI_1906_20190109%20sample.xml

jmccrae commented 4 years ago

Here is the results of running it through RDF validation

-> % rapper -o turtle GeneratedOntolex/PLI_1906_20190109\ sample.xml >/dev/null
rapper: Parsing URI file:///home/jmccrae/projects/elexis/tei2ontolex/GeneratedOntolex/PLI_1906_20190109%20sample.xml with parser rdfxml
rapper: Serializing with serializer turtle
rapper: Error - URI file:///home/jmccrae/projects/elexis/tei2ontolex/GeneratedOntolex/PLI_1906_20190109%20sample.xml:1588 - Using an attribute 'expand' without a namespace is forbidden.
rapper: Error - URI file:///home/jmccrae/projects/elexis/tei2ontolex/GeneratedOntolex/PLI_1906_20190109%20sample.xml:1588 - Using an attribute 'rend' without a namespace is forbidden.
rapper: Failed to parse file GeneratedOntolex/PLI_1906_20190109 sample.xml rdfxml content
rapper: Parsing returned 721 triples
laurentromary commented 4 years ago

That comes from a case which I had left aside: proverbs. Can you tell we how you would use http://www.lexinfo.net/ontology/3.0/lexinfo#proverb concretely. The example is:

<re type="prov">
   <form type="phrase"><orth>Petite pluie abat grand vent</orth></form>
   <sense>...</sense></re>

would something like:

<ontolex:LexicalEntry>
    <ontolex:canonicalForm>
        <rdf:Description>
            <ontolex:writtenRep xml:lang="fr">Petite pluie abat grand vent</ontolex:writtenRep>
        </rdf:Description>
    </ontolex:canonicalForm>
    <lexinfo:termType rdf:resource="http://www.lexinfo.net/ontology/3.0/lexinfo#proverb"/>
</ontolex:LexicalEntry>

be OK?

jmccrae commented 4 years ago

Yes, proverb is the object of the termType property.