globalwordnet / schemas

WordNet-LMF formats
https://globalwordnet.github.io/schemas/
19 stars 11 forks source link

Syntactic behaviour should be better modelled #8

Closed jmccrae closed 3 years ago

jmccrae commented 4 years ago

@fcbond

Why don't we add syntactic behavior to senses (and possibly synsets), which is where it is in PWN. It should not be on the lexical entry, ...

jmccrae commented 3 years ago

I would propose the following

For example

<Lexicon>
  <SyntacticBehaviour id="transitive" subcategorizationFrame="Someone %s something"/>
  <LexicalEntry id="ewn-do-v" syntacticBehaviour="transitive">
     ...
    <!-- Ideally we either indicate syntactic behaviour on the entry OR the sense... no need to do both -->
    <Sense id="sense1" syntacticBehaviour="transitive"/>
  </LexicalEntry>
</Lexicon>
fcbond commented 3 years ago

I think <Synset>and <Sense> should have syntacticBehaviour, not <Sense>and <LexicalEntry>,

otherwise I agree (although can we call it subCat to make it easier to fit things in our screens)?

lmorgadodacosta commented 3 years ago

Correct me if I am wrong, but a single sense should be able to have multiple values for SyntacticBehaviour. See, for example, here: 'give' in 02199590-v (OMW). This being the case, wouldn't it be preferable to use nested elements instead an attribute?

1313ou commented 3 years ago

Use IDREFS (note the S), meaning a sense can have multiple verb frames

jmccrae commented 3 years ago

Yes, I was proposing using IDREFS to give multiple links.

We could certainly use subCat as the attribute name... shorter can be better

goodmami commented 3 years ago

Two things here:

  1. (emphasis added)

    The tag <SyntacticBehaviour> can now also appear under the <Lexicon> tag

    The <LexicalEntry> element can now (in #29) take a subcat attribute. Why should we continue to allow <SyntacticBehaviour> elements to be defined in <LexicalEntry> elements? If you're concerned about backward compatibility, can we at least deprecate the old pattern (e.g., document it as such, tools can generate a warning) and then properly remove it in the future?

  2. Actually, what is the purpose of allowing subcat frames on both lexical entries and senses? Is the intuition that a frame on a lexical entry is shared by all its senses? If so, instead of adding this layer of interpretation onto the data, why don't we just be explicit and specify the frames on senses only?

fcbond commented 3 years ago

I agree that they should only be specified on senses or synsets (where all senses in the synset share the same syntactic behaviour).

On Mon, Aug 24, 2020 at 5:36 PM Michael Wayne Goodman < notifications@github.com> wrote:

Two things here:

1.

(emphasis added)

The tag can now also appear under the tag

The element can now (in #29 https://github.com/globalwordnet/schemas/pull/29) take a subcat attribute. Why should we continue to allow elements to be defined in elements? If you're concerned about backward compatibility, can we at least deprecate the old pattern (e.g., document it as such, tools can generate a warning) and then properly remove it in the future? 2.

Actually, what is the purpose of allowing subcat frames on both lexical entries and senses? Is the intuition that a frame on a lexical entry is shared by all its senses? If so, instead of adding this layer of interpretation onto the data, why don't we just be explicit and specify the frames on senses only?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/globalwordnet/schemas/issues/8#issuecomment-679013308, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIPZRTQVIYGXBNXEZRST5LSCIWJRANCNFSM4JJRRQEA .

-- Francis Bond http://www3.ntu.edu.sg/home/fcbond/ Division of Linguistics and Multilingual Studies Nanyang Technological University

arademaker commented 3 years ago

Why not only in senses to avoid extra confusion? Even if all senses of a given synset have the same syntactic behavior

goodmami commented 3 years ago

[...] or synsets (where all senses in the synset share the same syntactic behaviour).

Whether on <LexicalEntry> or <Synset>, this kind of interpretation needs to be implemented by the software and isn't explicit in the data.

To be more precise, here's what I (and @arademaker, it seems) are proposing (subcat only on <Sense>) for LMF:

--- a/WN-LMF-1.0.dtd
+++ b/WN-LMF-1.0.dtd
@@ -2,7 +2,7 @@
 <!ELEMENT LexicalResource (Lexicon+)>
 <!ATTLIST LexicalResource
     xmlns:dc CDATA #FIXED "http://purl.org/dc/elements/1.1/">
-<!ELEMENT Lexicon (LexicalEntry+, Synset*)>
+<!ELEMENT Lexicon (LexicalEntry+, Synset*, SyntacticBehaviour*)>
 <!ATTLIST Lexicon
     id ID #REQUIRED
     label CDATA #REQUIRED
@@ -29,7 +29,7 @@
     status CDATA #IMPLIED
     note CDATA #IMPLIED
     confidenceScore CDATA "1.0">
-<!ELEMENT LexicalEntry (Lemma, Form*, Sense*, SyntacticBehaviour*)>
+<!ELEMENT LexicalEntry (Lemma, Form*, Sense*)>
 <!ATTLIST LexicalEntry
     id ID #REQUIRED
     dc:contributor CDATA #IMPLIED
@@ -83,7 +83,8 @@
     note CDATA #IMPLIED
     confidenceScore CDATA #IMPLIED
     lexicalized (true|false) "true"
-    adjposition (a|ip|p) #IMPLIED>
+    adjposition (a|ip|p) #IMPLIED
+    subcat IDREFS #IMPLIED>
 <!ELEMENT Synset (Definition*, ILIDefinition?, SynsetRelation*, Example*)>
 <!ATTLIST Synset
     id ID #REQUIRED
@@ -211,6 +212,7 @@
     confidenceScore CDATA #IMPLIED>
 <!ELEMENT SyntacticBehaviour EMPTY>
 <!ATTLIST SyntacticBehaviour
+  id ID #REQUIRED
   subcategorizationFrame CDATA #REQUIRED
   senses IDREFS #IMPLIED>
 <!ELEMENT Count (#PCDATA)>
jmccrae commented 3 years ago

There are other models like OntoLex/LMF, which model syntactic behaviour solely on the entry level. However, for the moment, I only know of wordnets that model this on the sense level so we can introduce this modelling in v1.1. If there is a demand for modelling at the entry level too later, we can easily add this.

I have updated the PR.

jmccrae commented 3 years ago

NB. Small note on @goodmami's version. I think to keep backwards compatibility we should still allow <SyntacticBehaviour> to appear under <LexicalEntry>

goodmami commented 3 years ago

I think to keep backwards compatibility we should still allow <SyntacticBehaviour> to appear under <LexicalEntry>

Fair enough. Is this equivalent to putting it under <Lexicon>? That is, it only introduces a syntactic behavior that we can refer to, and doesn't carry any meaning about it being associated with the <LexicalEntry>?

And relatedly, do we have a process for breaking backward compatibility (e.g., "deprecate, then remove after a year")? If we keep everything backward compatible, the format will accumulate a lot of cruft.

jmccrae commented 3 years ago

On backwards compatibility, I think we should go by version numbering. e.g., 1.x is fully backwards compatible with 1.y (where x > y) but 2.0 can introduce breaking changes.

fcbond commented 3 years ago

That sounds good to me.

On Mon, Oct 5, 2020 at 6:04 PM John McCrae notifications@github.com wrote:

On backwards compatibility, I think we should go by version numbering. e.g., 1.x is fully backwards compatible with 1.y (where x > y) but 2.0 can introduce breaking changes.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/globalwordnet/schemas/issues/8#issuecomment-703530483, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIPZRWR24HG3KHZJ2N64STSJGKITANCNFSM4JJRRQEA .

-- Francis Bond http://www3.ntu.edu.sg/home/fcbond/ Division of Linguistics and Multilingual Studies Nanyang Technological University

jmccrae commented 3 years ago

Closed by #38