globalwordnet / gwadoc

documentation for things like relations and parts of speech used by wordnets
https://globalwordnet.github.io/gwadoc/
Creative Commons Attribution 4.0 International
12 stars 6 forks source link

Add some new relations #57

Open fcbond opened 3 years ago

fcbond commented 3 years ago

We want to add some new relations, these are all already in use by some wordnets.

Most of these are used by the Polish Wordnet Project, some by Czech and Bulgarian.

fcbond commented 3 years ago

@ewa-rudnicka and @fcbond are still hashing out if some of the relations should be symmetric or appear in pairs. Probably only the synonym and antonym relations should be symmetric.

fcbond commented 3 years ago

Basic documentation added by b4dbc8c

jmccrae commented 3 years ago

A couple of comments on the implementation of these new relationships:

  1. Could we lose the _of from property names like augmentative_of? This would fit the style better
  2. simple_aspect_pi and secondary_aspect_pi are marked as not directly applicable. So what is the point of them??
  3. ir_synonym is allowed between senses, but its parent relationship is not. This is a contradiction. I think that ir_synonym should not be allowed between senses
  4. Similarly, simple_aspect_ip and secondary_aspect_ip are allowed between synsets, but not between its parent derivation. Again this is contradictory and we should probably not allow this.
goodmami commented 3 years ago

A couple of comments on the implementation of these new relationships:

  1. Could we lose the _of from property names like augmentative_of? This would fit the style better

I also prefer without _of, but I'm not sure we have a consistent style. Here are some existing relations with of:

Similarly, these by ones are inconsistent:

Maybe we could use some relation naming guidelines.

  1. simple_aspect_pi and secondary_aspect_pi are marked as not directly applicable. So what is the point of them??

"(not directly applicable)" is what is shown by the HTML templates when a relation is not marked as a synset-synset, sense-sense, or sense-synset relation (e.g., for constitutive). My guess is that someone forgot to mark these relations as such. Otherwise I'd agree that there's no point. I think for "constitutive", which is not defined in the DTD, it is only there to group related relations as a supertype. Maybe it should go?

  1. ir_synonym is allowed between senses, but its parent relationship is not. This is a contradiction. I think that ir_synonym should not be allowed between senses

I see the logical argument, and also I think it's easier to start with a tight schema then loosen it later than vice versa. However, in the Japanese Wordnet we have words sharing a synset where some are interregister synonyms (e.g., 召し上がる, 召しあがる, 召上る (yes, three of them), 召される, 召す, 上がる, 食事, 食む, 食らう, 食う, 食べる, 食する, and 頂く all share a synset). These cannot be modeled with a synset-only relation. But maybe adapting the schema to the data we have is putting the cart before the horse and we should instead change the data (e.g., splitting those into different synsets)?

  1. Similarly, simple_aspect_ip and secondary_aspect_ip are allowed between synsets, but not between its parent derivation. Again this is contradictory and we should probably not allow this.

Ditto my first sentence from (3) above. I don't have any opinion or counterexamples otherwise.

I have another related concern:

  1. I find the _form relations (feminine_form, masculine_form, etc.) slightly odd. Why are we talking about "form" at the synset level? That seems like a word (or sense) thing. Why not just feminine, masculine, etc.?
jmccrae commented 3 years ago

Yes, for 1 and 5 I also prefer shorter names.

On 3, I think that this is something that should be fixed in Japanese WordNet, Typically, register changes (e.g., 'bloke' vs 'man') are different synsets in wordnets.

fcbond commented 3 years ago

Hi,

CCing Ewa in case she is not watching the github.

On Tue, Jan 12, 2021 at 12:20 AM Michael Wayne Goodman < notifications@github.com> wrote:

A couple of comments on the implementation of these new relationships:

  1. Could we lose the _of from property names like augmentative_of? This would fit the style better

I also prefer without _of, but I'm not sure we have a consistent style. Here are some existing relations with of:

  • state_of ⇔ be_in_state
  • subevent ⇔ is_subevent_of
  • manner_of ⇔ in_manner

Similarly, these by ones are inconsistent:

  • restricts ⇔ restricted_by
  • entails ⇔ is_entailed_by

Maybe we could use some relation naming guidelines.

Actually I have been thinking that for the next round we should try to do a consistent renaming (like Eurowordnet did). Until then, I guess I am OK with trying be consistent and going for shorter here (I am assuming we can update the paper after the conference).

  1. simple_aspect_pi and secondary_aspect_pi are marked as not directly applicable. So what is the point of them??

"(not directly applicable)" is what is shown by the HTML templates when a relation is not marked as a synset-synset, sense-sense, or sense-synset relation (e.g., for constitutive). My guess is that someone forgot to mark these relations as such. Otherwise I'd agree that there's no point. I think for "constitutive", which is not defined in the DTD, it is only there to group related relations as a supertype. Maybe it should go?

Michael is correct, it was a bug, I have fixed it.

  1. ir_synonym is allowed between senses, but its parent relationship is not. This is a contradiction. I think that ir_synonym should not be allowed between senses

I see the logical argument, and also I think it's easier to start with a tight schema then loosen it later than vice versa. However, in the Japanese Wordnet we have words sharing a synset where some are interregister synonyms (e.g., 召し上がる, 召しあがる, 召上る (yes, three of them), 召される, 召す, 上がる, 食事, 食む, 食らう, 食う, 食べる, 食する, and 頂く all share a synset). These cannot be modeled with a synset-only relation. But maybe adapting the schema to the data we have is putting the cart before the horse and we should instead change the data (e.g., splitting those into different synsets)?

I agree for the Japanese wordnet, we should definitely split these. I was trying to also allow the schema to model the Polish Wordnet, which treats these all as sense level relations, but as I think that for them synset is not a basic relation, maybe we should not worry about it? In which case I agree that sense-level is not needed. Ewa?

  1. Similarly, simple_aspect_ip and secondary_aspect_ip are allowed between synsets, but not between its parent derivation. Again this is contradictory and we should probably not allow this.

Ditto my first sentence from (3) above. I don't have any opinion or counterexamples otherwise.

Ewa?

I have another related concern:

  1. I find the _form relations (feminine_form, masculine_form, etc.) slightly odd. Why are we talking about "form" at the synset level? That seems like a word (or sense) thing. Why not just feminine, masculine, etc.?

In general, the link names are nominals, not adjectives, and I was trying to be consistent, ... I could be persuaded to lose the form for all of these if everyone agrees.

-- Francis Bond http://www3.ntu.edu.sg/home/fcbond/ Division of Linguistics and Multilingual Studies Nanyang Technological University

fcbond commented 3 years ago

Hi,

I had a chat with Ewa:

On Tue, Jan 12, 2021 at 12:20 AM Michael Wayne Goodman <

notifications@github.com> wrote:

A couple of comments on the implementation of these new relationships:

  1. Could we lose the _of from property names like augmentative_of? This would fit the style better

I also prefer without _of, but I'm not sure we have a consistent style. Here are some existing relations with of:

  • state_of ⇔ be_in_state
  • subevent ⇔ is_subevent_of
  • manner_of ⇔ in_manner

Similarly, these by ones are inconsistent:

  • restricts ⇔ restricted_by
  • entails ⇔ is_entailed_by

Maybe we could use some relation naming guidelines.

Actually I have been thinking that for the next round we should try to do a consistent renaming (like Eurowordnet did). Until then, I guess I am OK with trying be consistent and going for shorter here (I am assuming we can update the paper after the conference).

We are both ok with losing the '_of'

  1. simple_aspect_pi and secondary_aspect_pi are marked as not directly applicable. So what is the point of them??

"(not directly applicable)" is what is shown by the HTML templates when a relation is not marked as a synset-synset, sense-sense, or sense-synset relation (e.g., for constitutive). My guess is that someone forgot to mark these relations as such. Otherwise I'd agree that there's no point. I think for "constitutive", which is not defined in the DTD, it is only there to group related relations as a supertype. Maybe it should go?

Michael is correct, it was a bug, I have fixed it.

I have also marked them both to be just for sense-sense level relations.

  1. ir_synonym is allowed between senses, but its parent relationship is not. This is a contradiction. I think that ir_synonym should not be allowed between senses

I see the logical argument, and also I think it's easier to start with a tight schema then loosen it later than vice versa. However, in the Japanese Wordnet we have words sharing a synset where some are interregister synonyms (e.g., 召し上がる, 召しあがる, 召上る (yes, three of them), 召される, 召す, 上がる, 食事, 食む, 食らう, 食う, 食べる, 食する, and 頂く all share a synset). These cannot be modeled with a synset-only relation. But maybe adapting the schema to the data we have is putting the cart before the horse and we should instead change the data (e.g., splitting those into different synsets)?

I agree for the Japanese wordnet, we should definitely split these. I was trying to also allow the schema to model the Polish Wordnet, which treats these all as sense level relations, but as I think that for them synset is not a basic relation, maybe we should not worry about it? In which case I agree that sense-level is not needed. Ewa?

I have also changed these to sense-sense.

  1. Similarly, simple_aspect_ip and secondary_aspect_ip are allowed between synsets, but not between its parent derivation. Again this is contradictory and we should probably not allow this.

Ditto my first sentence from (3) above. I don't have any opinion or counterexamples otherwise.

Ewa?

I have another related concern:

  1. I find the _form relations (feminine_form, masculine_form, etc.) slightly odd. Why are we talking about "form" at the synset level? That seems like a word (or sense) thing. Why not just feminine, masculine, etc.?

In general, the link names are nominals, not adjectives, and I was trying to be consistent, ... I could be persuaded to lose the form for all of these if everyone agrees.

I think diminutive and augmentative are also ok as just sense/sense.

For male/female/young, I would like to capture relations like 'King Queen', 'Kangaroo Joey' as well as 'Prince Princess' and 'Pig Piglet', So I would like to leave it at I guess if we thought it important we could have two different sets of relations, but I would prefer to leave that for another revision.

I have checked in the changes to the synset/sense level issues in the documentation.

do we want the names to be: young/has_young male/has_male (to emphasize the concept) female/has_female (to emphasize the concept) diminuative/has_diminuative augmentative/has_augmentative

If everyone agrees I will make this change too.

-- Francis Bond http://www3.ntu.edu.sg/home/fcbond/ Division of Linguistics and Multilingual Studies Nanyang Technological University

fcbond commented 3 years ago

Hi,

A couple of comments on the implementation of these new relationships:

  1. Could we lose the _of from property names like augmentative_of? This would fit the style better

I also prefer without _of, but I'm not sure we have a consistent style. Here are some existing relations with of:

  • state_of ⇔ be_in_state
  • subevent ⇔ is_subevent_of
  • manner_of ⇔ in_manner

Similarly, these by ones are inconsistent:

  • restricts ⇔ restricted_by
  • entails ⇔ is_entailed_by

Maybe we could use some relation naming guidelines.

Actually I have been thinking that for the next round we should try to do a consistent renaming (like Eurowordnet did).

I would also opt for consistent renaming and where possible shorter forms.

Until then, I guess I am OK with trying be consistent and going for shorter

here (I am assuming we can update the paper after the conference).

We are both ok with losing the '_of'

Yes.

  1. simple_aspect_pi and secondary_aspect_pi are marked as not directly applicable. So what is the point of them??

"(not directly applicable)" is what is shown by the HTML templates when a relation is not marked as a synset-synset, sense-sense, or sense-synset relation (e.g., for constitutive). My guess is that someone forgot to mark these relations as such. Otherwise I'd agree that there's no point. I think for "constitutive", which is not defined in the DTD, it is only there to group related relations as a supertype. Maybe it should go?

Michael is correct, it was a bug, I have fixed it.

I have also marked them both to be just for sense-sense level relations.

Both aspectual relations are derivationally based and are sense-sense relations in plWordNet and I would opt to keep them as such generally. "Constitutive" groups relations that define synset membership in plWordNet, such as hypo/hypernymy, mero/holonymy which are synset-synset and antonymy which is sense-sense.

  1. ir_synonym is allowed between senses, but its parent relationship is not. This is a contradiction. I think that ir_synonym should not be allowed between senses

I see the logical argument, and also I think it's easier to start with a tight schema then loosen it later than vice versa. However, in the Japanese Wordnet we have words sharing a synset where some are interregister synonyms (e.g., 召し上がる, 召しあがる, 召上る (yes, three of them), 召される, 召す, 上がる, 食事, 食む, 食らう, 食う, 食べる, 食する, and 頂く all share a synset). These cannot be modeled with a synset-only relation. But maybe adapting the schema to the data we have is putting the cart before the horse and we should instead change the data (e.g., splitting those into different synsets)?

I agree for the Japanese wordnet, we should definitely split these. I was trying to also allow the schema to model the Polish Wordnet, which treats these all as sense level relations, but as I think that for them synset is not a basic relation, maybe we should not worry about it? In which case I agree that sense-level is not needed. Ewa?

I have also changed these to sense-sense.

Inter register synonymy is a synset-synset relation in plWordNet and as I understood me and Francis agreed to keep it as such for the general model, too. It need not be derivationally based, as most other new relations that we are adding now.

  1. Similarly, simple_aspect_ip and secondary_aspect_ip are allowed between synsets, but not between its parent derivation. Again this is contradictory and we should probably not allow this.

Ditto my first sentence from (3) above. I don't have any opinion or counterexamples otherwise.

Ewa?

As written above, both aspectual relations are sense-sense in plWordNet, because they are derivationally based, so their parent is derivation, also a sense-level one and I would keep to that in our format, too.

I have another related concern:

  1. I find the _form relations (feminine_form, masculine_form, etc.) slightly odd. Why are we talking about "form" at the synset level? That seems like a word (or sense) thing. Why not just feminine, masculine, etc.?

In general, the link names are nominals, not adjectives, and I was trying to be consistent, ... I could be persuaded to lose the form for all of these if everyone agrees.

I think diminutive and augmentative are also ok as just sense/sense.

Yes, because again they are derivationally based (at least in Polish, but in other Slavic languages, too).

For male/female/young, I would like to capture relations like 'King Queen', 'Kangaroo Joey' as well as 'Prince Princess' and 'Pig Piglet', So I would like to leave it at

I guess it was supposed to be at synset level, because they are not always derivationally based and we want to note all instances of such relations.

I guess if we thought it important we could have two different sets of

relations, but I would prefer to leave that for another revision.

I have checked in the changes to the synset/sense level issues in the documentation.

do we want the names to be: young/has_young male/has_male (to emphasize the concept) female/has_female (to emphasize the concept) diminuative/has_diminuative augmentative/has_augmentative

I am ok with that, it should only be 'diminutive '.

If everyone agrees I will make this change too.

Best, Ewa

-- Francis Bond http://www3.ntu.edu.sg/home/fcbond/ Division of Linguistics and Multilingual Studies Nanyang Technological University

goodmami commented 3 years ago

Just confirming that ir_synonym is changed to synset-synset and not sense-sense as written above?

Also, I see that you've changed masculine to male and feminine to female. My qualm was with using the _form suffix, not with the masculine/feminine part. There are times that the grammatical gender does not match the natural gender (for example, from Wikipedia, cailín "girl" (Irish) is masculine), so male/female might be problematic. Which are we annotating?

fcbond commented 3 years ago

Hi,

On Thu, Jan 14, 2021 at 3:39 PM Michael Wayne Goodman < notifications@github.com> wrote:

Just confirming that ir_synonym is changed to synset-synset and not sense-sense as written above?

Yes, sorry for the confusion.

Also, I see that you've changed masculine to male and feminine to female. My qualm was with using the _form suffix, not with the masculine/feminine part. There are times that the grammatical gender does not match the natural gender (for example, from Wikipedia https://en.wikipedia.org/wiki/Grammatical_gender#Grammatical_gender_need_not_match_natural_gender, cailín "girl" (Irish) is masculine), so male/female might be problematic. Which are we annotating?

We are annotating natural gender, which is why I suggested the change.

I guess if we wanted to have the sense-sense separate from synset-synset, then the former would be masculine/feminine and the latter male/female, ....

-- Francis Bond http://www3.ntu.edu.sg/home/fcbond/ Division of Linguistics and Multilingual Studies Nanyang Technological University

fcbond commented 3 years ago

I changed the names in the paper, and will update the PR on the weekend.

John can you submit the final version to Alexandre, or would you like me to do it?

On Thu, Jan 14, 2021 at 3:46 PM Francis Bond bond@ieee.org wrote:

Hi,

On Thu, Jan 14, 2021 at 3:39 PM Michael Wayne Goodman < notifications@github.com> wrote:

Just confirming that ir_synonym is changed to synset-synset and not sense-sense as written above?

Yes, sorry for the confusion.

Also, I see that you've changed masculine to male and feminine to female. My qualm was with using the _form suffix, not with the masculine/feminine part. There are times that the grammatical gender does not match the natural gender (for example, from Wikipedia https://en.wikipedia.org/wiki/Grammatical_gender#Grammatical_gender_need_not_match_natural_gender, cailín "girl" (Irish) is masculine), so male/female might be problematic. Which are we annotating?

We are annotating natural gender, which is why I suggested the change.

I guess if we wanted to have the sense-sense separate from synset-synset, then the former would be masculine/feminine and the latter male/female, ....

-- Francis Bond http://www3.ntu.edu.sg/home/fcbond/ Division of Linguistics and Multilingual Studies Nanyang Technological University

-- Francis Bond http://www3.ntu.edu.sg/home/fcbond/ Division of Linguistics and Multilingual Studies Nanyang Technological University

jmccrae commented 3 years ago

Will do