alpheios-project / morphsvc

python morphology service
GNU General Public License v3.0
0 stars 1 forks source link

whitaker's: output for accurritt is incorrect? #4

Open balmas opened 6 years ago

balmas commented 6 years ago

We get two entries for accurritt, one which has a meaning and one which does not, and otherwise they are identical

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <oac:Annotation xmlns:oac="http://www.openannotation.org/ns/" rdf:about="urn:TuftsMorphologyService:accurrit:whitakerLat">
    <dcterms:creator xmlns:dcterms="http://purl.org/dc/terms/">
      <foaf:Agent xmlns:foaf="http://xmlns.com/foaf/0.1/" rdf:about="net.alpheios:tools:wordsxml.v1"/>
    </dcterms:creator>
    <dcterms:created xmlns:dcterms="http://purl.org/dc/terms/">2018-08-08T15:17:26.179853</dcterms:created>
    <dc:rights xmlns:dc="http://purl.org/dc/elements/1.1/">Short definitions and morphology from Words by William Whitaker, Copyright 1993-2007.</dc:rights>
    <oac:hasTarget>
      <rdf:Description rdf:about="urn:word:accurrit"/>
    </oac:hasTarget>
    <dc:title xmlns:dc="http://purl.org/dc/elements/1.1/"/>
    <oac:hasBody rdf:resource="urn:uuid:idm140640518744904"/>
    <oac:Body rdf:about="urn:uuid:idm140640518744904">
      <rdf:type rdf:resource="cnt:ContentAsXML"/>
      <cnt:rest xmlns:cnt="http://www.w3.org/2008/content#">
        <entry>
          <infl>
            <term xml:lang="lat">
              <stem>accurr</stem>
              <suff>it</suff>
            </term>
            <pofs order="3">verb</pofs>
            <conj>3rd</conj>
            <var>1st</var>
            <tense>present</tense>
            <voice>active</voice>
            <mood>indicative</mood>
            <pers>3rd</pers>
            <num>singular</num>
          </infl>
          <dict>
            <hdwd xml:lang="lat">accurro, accurrere, accucurri, accursus</hdwd>
            <pofs order="3">verb</pofs>
            <conj>3rd</conj>
            <freq order="5">frequent</freq>
            <src>Ox.Lat.Dict.</src>
          </dict>
        </entry>
      </cnt:rest>
    </oac:Body>
    <oac:hasBody rdf:resource="urn:uuid:idm140640523261464"/>
    <oac:Body rdf:about="urn:uuid:idm140640523261464">
      <rdf:type rdf:resource="cnt:ContentAsXML"/>
      <cnt:rest xmlns:cnt="http://www.w3.org/2008/content#">
        <entry>
          <infl>
            <term xml:lang="lat">
              <stem>accurr</stem>
              <suff>it</suff>
            </term>
            <pofs order="3">verb</pofs>
            <conj>3rd</conj>
            <var>1st</var>
            <tense>present</tense>
            <voice>active</voice>
            <mood>indicative</mood>
            <pers>3rd</pers>
            <num>singular</num>
          </infl>
          <infl>
            <term xml:lang="lat">
              <stem>accurr</stem>
              <suff>it</suff>
            </term>
            <pofs order="3">verb</pofs>
            <conj>3rd</conj>
            <var>1st</var>
            <tense>perfect</tense>
            <voice>active</voice>
            <mood>indicative</mood>
            <pers>3rd</pers>
            <num>singular</num>
          </infl>
          <dict>
            <hdwd xml:lang="lat">accurro, accurrere, accurri, accursus</hdwd>
            <pofs order="3">verb</pofs>
            <conj>3rd</conj>
            <freq order="5">frequent</freq>
            <src>Ox.Lat.Dict.</src>
          </dict>
          <mean>run/hasten to (help); come/rush up (inanim subj.); charge, rush to attack;</mean>
        </entry>
      </cnt:rest>
    </oac:Body>
  </oac:Annotation>
</rdf:RDF>
balmas commented 6 years ago

This appears to be a slight variation on #3

it's another case of alternate spellings for the principal parts (accucurri vs accurri)

accurr.it V 3 1 PRES ACTIVE IND 3 S
accurro, accurrere, accucurri, accursus V [XXXBO]
accurr.it V 3 1 PRES ACTIVE IND 3 S
accurr.it V 3 1 PERF ACTIVE IND 3 S
accurro, accurrere, accurri, accursus V [XXXBO]
run/hasten to (help); come/rush up (inanim subj.); charge, rush to attack;

It looks like the wordsxml wrapper on words is sensitive to whether or not there is anything in between the dictionary entries in the original words structures -- if there are two identical lemma entries next to each other, as in aberis, maybe it aggregates them, and if any forms come in between it doesn't.

E.g. here is the output for aberis ab.eris V 5 1 FUT ACTIVE IND 2 S
absum, abesse, abfui, abfuturus V [XXXDS] lesser absum, abesse, afui, afuturus V [XXXAO]
be away/absent/distant/missing; be free/removed from; be lacking; be distinct;