Closed jmccrae closed 3 years ago
I would propose the following
<SyntacticBehaviour>
can now also appear under the <Lexicon>
tag<Sense>
and <LexicalEntry>
can refer to syntactic behaviours by IDFor example
<Lexicon>
<SyntacticBehaviour id="transitive" subcategorizationFrame="Someone %s something"/>
<LexicalEntry id="ewn-do-v" syntacticBehaviour="transitive">
...
<!-- Ideally we either indicate syntactic behaviour on the entry OR the sense... no need to do both -->
<Sense id="sense1" syntacticBehaviour="transitive"/>
</LexicalEntry>
</Lexicon>
I think <Synset>
and <Sense>
should have syntacticBehaviour
, not <Sense>
and <LexicalEntry>
,
otherwise I agree (although can we call it subCat
to make it easier to fit things in our screens)?
Correct me if I am wrong, but a single sense should be able to have multiple values for SyntacticBehaviour
.
See, for example, here: 'give' in 02199590-v (OMW).
This being the case, wouldn't it be preferable to use nested elements instead an attribute?
Use IDREFS (note the S), meaning a sense can have multiple verb frames
Yes, I was proposing using IDREFS
to give multiple links.
We could certainly use subCat
as the attribute name... shorter can be better
Two things here:
(emphasis added)
The tag
<SyntacticBehaviour>
can now also appear under the<Lexicon>
tag
The <LexicalEntry>
element can now (in #29) take a subcat
attribute. Why should we continue to allow <SyntacticBehaviour>
elements to be defined in <LexicalEntry>
elements? If you're concerned about backward compatibility, can we at least deprecate the old pattern (e.g., document it as such, tools can generate a warning) and then properly remove it in the future?
Actually, what is the purpose of allowing subcat frames on both lexical entries and senses? Is the intuition that a frame on a lexical entry is shared by all its senses? If so, instead of adding this layer of interpretation onto the data, why don't we just be explicit and specify the frames on senses only?
I agree that they should only be specified on senses or synsets (where all senses in the synset share the same syntactic behaviour).
On Mon, Aug 24, 2020 at 5:36 PM Michael Wayne Goodman < notifications@github.com> wrote:
Two things here:
1.
(emphasis added)
The tag
can now also appear under the tag The
element can now (in #29 https://github.com/globalwordnet/schemas/pull/29) take a subcat attribute. Why should we continue to allow elements to be defined in elements? If you're concerned about backward compatibility, can we at least deprecate the old pattern (e.g., document it as such, tools can generate a warning) and then properly remove it in the future? 2. Actually, what is the purpose of allowing subcat frames on both lexical entries and senses? Is the intuition that a frame on a lexical entry is shared by all its senses? If so, instead of adding this layer of interpretation onto the data, why don't we just be explicit and specify the frames on senses only?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/globalwordnet/schemas/issues/8#issuecomment-679013308, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIPZRTQVIYGXBNXEZRST5LSCIWJRANCNFSM4JJRRQEA .
-- Francis Bond http://www3.ntu.edu.sg/home/fcbond/ Division of Linguistics and Multilingual Studies Nanyang Technological University
Why not only in senses to avoid extra confusion? Even if all senses of a given synset have the same syntactic behavior
[...] or synsets (where all senses in the synset share the same syntactic behaviour).
Whether on <LexicalEntry>
or <Synset>
, this kind of interpretation needs to be implemented by the software and isn't explicit in the data.
To be more precise, here's what I (and @arademaker, it seems) are proposing (subcat
only on <Sense>
) for LMF:
--- a/WN-LMF-1.0.dtd
+++ b/WN-LMF-1.0.dtd
@@ -2,7 +2,7 @@
<!ELEMENT LexicalResource (Lexicon+)>
<!ATTLIST LexicalResource
xmlns:dc CDATA #FIXED "http://purl.org/dc/elements/1.1/">
-<!ELEMENT Lexicon (LexicalEntry+, Synset*)>
+<!ELEMENT Lexicon (LexicalEntry+, Synset*, SyntacticBehaviour*)>
<!ATTLIST Lexicon
id ID #REQUIRED
label CDATA #REQUIRED
@@ -29,7 +29,7 @@
status CDATA #IMPLIED
note CDATA #IMPLIED
confidenceScore CDATA "1.0">
-<!ELEMENT LexicalEntry (Lemma, Form*, Sense*, SyntacticBehaviour*)>
+<!ELEMENT LexicalEntry (Lemma, Form*, Sense*)>
<!ATTLIST LexicalEntry
id ID #REQUIRED
dc:contributor CDATA #IMPLIED
@@ -83,7 +83,8 @@
note CDATA #IMPLIED
confidenceScore CDATA #IMPLIED
lexicalized (true|false) "true"
- adjposition (a|ip|p) #IMPLIED>
+ adjposition (a|ip|p) #IMPLIED
+ subcat IDREFS #IMPLIED>
<!ELEMENT Synset (Definition*, ILIDefinition?, SynsetRelation*, Example*)>
<!ATTLIST Synset
id ID #REQUIRED
@@ -211,6 +212,7 @@
confidenceScore CDATA #IMPLIED>
<!ELEMENT SyntacticBehaviour EMPTY>
<!ATTLIST SyntacticBehaviour
+ id ID #REQUIRED
subcategorizationFrame CDATA #REQUIRED
senses IDREFS #IMPLIED>
<!ELEMENT Count (#PCDATA)>
There are other models like OntoLex/LMF, which model syntactic behaviour solely on the entry level. However, for the moment, I only know of wordnets that model this on the sense level so we can introduce this modelling in v1.1. If there is a demand for modelling at the entry level too later, we can easily add this.
I have updated the PR.
NB. Small note on @goodmami's version. I think to keep backwards compatibility we should still allow <SyntacticBehaviour>
to appear under <LexicalEntry>
I think to keep backwards compatibility we should still allow
<SyntacticBehaviour>
to appear under<LexicalEntry>
Fair enough. Is this equivalent to putting it under <Lexicon>
? That is, it only introduces a syntactic behavior that we can refer to, and doesn't carry any meaning about it being associated with the <LexicalEntry>
?
And relatedly, do we have a process for breaking backward compatibility (e.g., "deprecate, then remove after a year")? If we keep everything backward compatible, the format will accumulate a lot of cruft.
On backwards compatibility, I think we should go by version numbering. e.g., 1.x is fully backwards compatible with 1.y (where x > y) but 2.0 can introduce breaking changes.
That sounds good to me.
On Mon, Oct 5, 2020 at 6:04 PM John McCrae notifications@github.com wrote:
On backwards compatibility, I think we should go by version numbering. e.g., 1.x is fully backwards compatible with 1.y (where x > y) but 2.0 can introduce breaking changes.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/globalwordnet/schemas/issues/8#issuecomment-703530483, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIPZRWR24HG3KHZJ2N64STSJGKITANCNFSM4JJRRQEA .
-- Francis Bond http://www3.ntu.edu.sg/home/fcbond/ Division of Linguistics and Multilingual Studies Nanyang Technological University
Closed by #38
@fcbond