globalwordnet / english-wordnet

The Open English WordNet
https://en-word.net/
Other
442 stars 52 forks source link

Hierarchy of domains? #442

Closed rwingerter55 closed 4 years ago

rwingerter55 commented 4 years ago

The contents of dcterms:subject can be regarded as a top ontology for PWN and other wordnets. It is worthwile to understand its content, usage, structure and quality.

In the Open English Wordnet, each ontolex:LexicalConcept has exactly one subject (predicate dcterms:subject). There are 45 subjects. They are named after the lexicographer files of Princeton WordNet. The PWN documentation has a short description of the contents of each file.

Let us have a closer look at nouns. Nouns are divided into 26 semantic fields. The most general one is "noun.Tops", the others are more specific. We can use a SPARQL query to find out wether there are unambiguous hierarchical relations between other semantic fields, like for example between noun.act, noun.event and noun.process.

broaderSubject narrowSubject count of wn:hyponym relations
noun.Tops noun.act 41
noun.Tops noun.event 9
noun.Tops noun.process 25
noun.act noun.event 5
noun.act noun.process 3
noun.event noun.act 8
noun.process noun.act 14
noun.process noun.event 1

The results do not show a clear hierarchy between act, event and process. The same may be true for other subjects. Does anyone know if this is intended or caused by faulty data?

jmccrae commented 4 years ago

Do you have any suggestions here?

rwingerter55 commented 4 years ago

After having a closer look I think "noun.Tops" and the more specific subjects from PWN lexicographer files are not a top ontology but just a way of clustering concepts in order to facilitate the job of lexicographers.

IMO they are quite useful. In VocBench3 I created the subjects as conceptSets, which in the UI can be used for filtering by subject. I also marked the topConcepts of each conceptSet, which allows a nice hierarchical display in the concept panel.

rwingerter55 commented 4 years ago

AFAIC you can close the issue.