globalwordnet / english-wordnet

The Open English WordNet
https://en-word.net/
Other
465 stars 56 forks source link

Split lexical entries #283

Closed 1313ou closed 4 years ago

1313ou commented 4 years ago

Shouldn't the 2 lexical entries below be merged into one lexical entry with 2 senses -same form -see cursor 'n' that spans the two of them The principle is that a lexical entry is not any subgrouping of senses and there should not be clones differring only by id name within the same lexfile.

This comes from an import of adjectives that is questionable: it considered the position suffix as part of the lemma, 'aware-p-' is derived from 'aware(p)'.

As there are lots of similar cases (about 1000) this can't be corrected by hand.

   <LexicalEntry id="ewn-aware-p--a">
      <Lemma writtenForm="aware" partOfSpeech="a"/>
      <Sense id="ewn-aware-p--a-00191603-01" n="0" synset="ewn-00191603-a" dc:identifier="aware%3:00:00::">
        <SenseRelation relType="derivation" target="ewn-awareness-n-05683749-01"/> <!-- awareness, consciousness, cognizance, cognisance, knowingness -->
        <SenseRelation relType="antonym" target="ewn-unaware-a-00193091-01"/> <!-- unaware, incognizant -->
        </Sense>
    </LexicalEntry>
--
    <LexicalEntry id="ewn-aware-a">
      <Lemma writtenForm="aware" partOfSpeech="a"/>
      <Sense id="ewn-aware-a-01984219-02" n="1" synset="ewn-01984219-a" dc:identifier="aware%3:00:04::">
        <SenseRelation relType="derivation" target="ewn-awareness-n-05683749-01"/> <!-- awareness, consciousness, cognizance, cognisance, knowingness -->
        <SenseRelation relType="derivation" target="ewn-awareness-n-05685793-01"/> <!-- awareness, sentience -->
        </Sense>
    </LexicalEntry>

Correct:

   <LexicalEntry id="ewn-aware-a">
      <Lemma writtenForm="aware" partOfSpeech="a"/>
      <Sense id="ewn-aware-a-00191603-01" n="0" synset="ewn-00191603-a" dc:identifier="aware%3:00:00::">
        <SenseRelation relType="derivation" target="ewn-awareness-n-05683749-01"/> <!-- awareness, consciousness, cognizance, cognisance, knowingness -->
        <SenseRelation relType="antonym" target="ewn-unaware-a-00193091-01"/> <!-- unaware, incognizant -->
        </Sense>
      <Sense id="ewn-aware-a-01984219-02" n="1" synset="ewn-01984219-a" dc:identifier="aware%3:00:04::">
        <SenseRelation relType="derivation" target="ewn-awareness-n-05683749-01"/> <!-- awareness, consciousness, cognizance, cognisance, knowingness -->
        <SenseRelation relType="derivation" target="ewn-awareness-n-05685793-01"/> <!-- awareness, sentience -->
        </Sense>
    </LexicalEntry>

with possible adjposition="p" attribute

jmccrae commented 4 years ago

Didn't you already report this as #180?

1313ou commented 4 years ago

You're right I had noted the extra garbage in the ids but I hadn't noted then the implications, that you go against the very idea of a LexicalEntry (lexical entries are split not only across the different lexfiles but also within the same file) that drifts into a grouping of senses.

That said, I managed to automate the fix/merging through XSL3.0 in what I think is an elegant way. I need some time for extra checks.

Shall I generate a PR or keep it to the fork ?

arademaker commented 4 years ago

PR would be great but I didn’t follow:

lexical entries are split not only across the different lexfiles but also within the same file

Can you elaborate on that? I take lexical entries as word forms, the word lemas since wordnets usually do not deal with inflections.

jmccrae commented 4 years ago

I don't think we are against the idea of a LexicalEntry, just for the moment we model things this way (which is derived from PWN). I agree that we should merge these entries and have an attribute on the <Sense> to indicate this information. I have made a pull request to the schemas project proposing this

https://github.com/globalwordnet/schemas/pull/9

@1313ou if you would be able to make a PR for this issue that would be a super.

Also, as a matter of housekeeping, could we close this issue and work on this as #180?