globalwordnet / schemas

WordNet-LMF formats
https://globalwordnet.github.io/schemas/
20 stars 11 forks source link

WordNet part-of-speech vs Ontolex part-of-speech #58

Closed simongray closed 2 years ago

simongray commented 3 years ago

Scanning through the Turtle file, I noticed that you define your own POS relations and classes rather than use the lexinfo:partOfSpeech relation which is heavily used in the Ontolex specification, which I understand that @jmccrae helped bring to life. I'm unsure why this is the case?

In the Ontolex specification it is specifically stated that

the model abstracts from specific linguistic theory or category systems used to describe the linguistic properties of lexical entries and their syntactic behavior, encouraging reuse of existing data category systems or linguistic ontologies.

I think that this is an excellent ideal as it makes integration of existing datasets mostly a matter of merging sets of triples. The second best option would be having some kind of derived lexinfo relation triple which can be inferred via equivalent/subclass relations.

Unfortunately, the GWA schema's partOfSpeech relation and PartOfSpeech class are proprietary and not linked to any external definitions. I have used Ontolex as the basis for the new version of DanNet, so my part-of-speech tags are all defined using lexinfo:partOfSpeech relation rather than wn:partOfSpeech.

How do you suggest we bridge this gap? The way I see it, either version 1.2 of the schema removes this bit and datasets use lexinfo:partOfSpeech directly -OR- a direct equivalency to lexinfo:partOfSpeech is established in the schema -- preferably the first as it simplifies things.

I could also add both wn and lexinfo relations for all LexicalEntry classes in the new DanNet, but that's both confusing and a messy fix IMO. Better to fix the schema than work around its flaws. Having competing standards for this is not a great situation.


The relevant part of the schema:

:partOfSpeech a owl:ObjectProperty ;
  rdfs:domain ontolex:LexicalEntry ;
  rdfs:range :PartOfSpeech ;
  rdfs:label "part of speech"@en ;
  rdfs:comment "The syntactic class of the entry, e.g., noun, verb"@en .

:PartOfSpeech a owl:Class ;
  rdfs:label "part of speech"@en ;
  rdfs:comment "The syntactic class of the entry, e.g., noun, verb"@en ;
  owl:oneOf (
    :noun :verb :adjective :adverb :adjective_satellite :named_entity 
    :conjunction :adposition :other_pos :unknown_pos ) .

:noun a :PartOfSpeech ;
  rdfs:label "noun"@en.
:verb a :PartOfSpeech ;
  rdfs:label "verb"@en .
:adjective a :PartOfSpeech ;
  rdfs:label "adjective"@en .
:adverb a :PartOfSpeech ;
  rdfs:label "adverb"@en .
:adjective_satellite a :PartOfSpeech ;
  rdfs:label "adjective satellite"@en .
:named_entity a :PartOfSpeech ;
  rdfs:label "named entity"@en .
:conjunction a :PartOfSpeech ;
  rdfs:label "conjunction"@en .
:adposition a :PartOfSpeech ;
  rdfs:label "adposition"@en .
:other_pos a :PartOfSpeech ;
  rdfs:label "other pos"@en .
:unknown_pos a :PartOfSpeech ;
  rdfs:label "unknown pos"@en .
jmccrae commented 3 years ago

We chose this because there were some values, e.g., adjective_satellite that aren't in LexInfo and wouldn't really make sense to add. We wanted to avoid mixing namespaces also (e.g., lexinfo:noun with wn:adjective_satellite)

However, adding some owl:sameAs links to LexInfo would be very useful!

simongray commented 3 years ago

Ok, then I will make an attempt at that for 1.2.

simongray commented 3 years ago

Having taken a look at it, there doesn't seem to be a way to make it compatible in a satisfactory way. Defining owl:sameAs doesn't map the actual relations, just the instances.

Lexinfo unfortunately has a definite range (the lexinfo:PartOfSpeech class) so I'm unsure whether it's even possible to define the wn:partOfSpeech as a sub-property of lexinfo:PartOfSpeech while extending its range to encompass both PartOfSpeech classes. It doesn't seem like it will be possible.

jmccrae commented 3 years ago

Not sure I get what the issue is here. I would assume that wn:PartOfSpeechlexinfo:PartOfSpeech for both the class and the property?

simongray commented 3 years ago

I am not OWL expert, that is probably the main issue ;-)

What you're saying is true. I guess owl:unionOf could be used to define the 1.2 wn:partOfSpeech range to be both wn:PartOfSpeech and lexinfo:PartOfSpeech? Still, not using lexinfo:partOfSpeech directly does make it harder to integrate directly with other Ontolex datasets.

jmccrae commented 3 years ago

If you add a subclass axiom between wn:PartOfSpeech and lexinfo:PartOfSpeech then there is not need for a unionOf statement, every wn:PartOfSpeech is also a lexinfo:PartOfSpeech so wn:PartOfSpeechlexinfo:PartOfSpeechlexinfo:PartOfSpeech.

That is if we add

wn:partOfSpeech owl:subPropertyOf lexinfo:partOfSpeech .
wn:PartOfSpeech owl:subClassOf lexinfo:PartOfSpeech .
wn:noun owl:sameAs lexinfo:noun . # etc.

Then if we have

X wn:partOfSpeech wn:noun

We then infer

X lexinfo:partOfSpeech lexinfo:noun
simongray commented 3 years ago

Ah, that makes a lot of sense. I was stuck thinking that of course the subProperty can't extend the range of it's parent property, but yeah... if we define everything as subclasses it will technically not be doing that. Thanks a lot for explaining in this detail.

I do wonder what to do about the owl:oneOf relation from the wn:PartOfSpeech currently defined in the schema. I believe that the set of POS tags in lexinfo is more extensive - or at least not finite - so I wonder if it makes sense to also remove this restriction if, say, you wanted to use a lexinfo:PartOfSpeech as the object of a wn:partOfSpeech relation...? What are your thoughts on this?

...

As for the actual inference, I think I will also have to take another look at the Prolog-like logic rule DSL used in Apache Jena to make sure it applies the same kind of reasoning you just described in practice. The default level of OWL inference was a bit too comprehensive (and therefore slow) on the DanNet data, so to make it snappier I basically removed all statements that didn't infer inverse relationships ;-)

jmccrae commented 3 years ago

I do wonder what to do about the owl:oneOf relation from the wn:PartOfSpeech currently defined in the schema. I believe that the set of POS tags in lexinfo is more extensive - or at least not finite - so I wonder if it makes sense to also remove this restriction if, say, you wanted to use a lexinfo:PartOfSpeech as the object of a wn:partOfSpeech relation...? What are your thoughts on this?

It would not be compatible with this schema to use a value of part-of-speech other than ones specified. We need this to ensure interoperability. (Of course we are open to proposals for new values)

simongray commented 3 years ago

I'm sorry @jmccrae, but I have couple of remaining questions for things that are still not quite clear to me...

  1. Can you tell me what the distinction is between e.g. the capitalised and all-lowercase versions of POS tags in lexinfo, e.g. Adjective vs adjective? It is not immediately clear to me.
  2. Since you're involved with writing both this schema and the lexinfo one, how come you don't just add e.g. adjective_satellite to lexinfo and use lexinfo directly? Is it because adjective_satellite is non-standard...?
jmccrae commented 3 years ago

Can you tell me what the distinction is between e.g. the capitalised and all-lowercase versions of POS tags in lexinfo, e.g. Adjective vs adjective? It is not immediately clear to me.

Essentially capitalised names are for classes and lower case for values. So Adjective is a subclass of LexicalEntry, while adjective is the value of part-of-speech property. The following equivalence basically holds

X rdf:type ontolex:LexicalEntry and X lexinfo:partOfSpeech lexinfo:adjective <=> X rdf:type lexinfo:Adjective

Since you're involved with writing both this schema and the lexinfo one, how come you don't just add e.g. adjective_satellite to lexinfo and use lexinfo directly? Is it because adjective_satellite is non-standard...?

Exactly