globalwordnet / schemas

WordNet-LMF formats
https://globalwordnet.github.io/schemas/
20 stars 11 forks source link

2.0 #11

Closed 1313ou closed 4 years ago

1313ou commented 4 years ago

XSD Schema 2.0

(from README)

This is to equip WordNet with state-of-the-art validation schemas the way FrameNet did. This move is dictated by the following:

name spaces

References to ILI and PWN have been transferred to their own namespace (ili: and pwn: respectively) as do the meta annotations (meta:).

It is not desirable to mix foreign references (in the sense that they refer to something outside the database, the PWN sensekey is a case in point) with internal references. Different namespaces reflect this.

modules

The design is modular:

ili.xsd and pwn.xsd for all the ILI and PWN stuff in their own namespaces. (ewn-)idtypes(-relax_idrefs).xsd for core id types (it defines ID policy). (ewn-)wordtypes.xsd for word types (it defines word form policy). types.xsd for core data types. pwn.xsd for PWN namespace. ili.xsd for ili namespace. meta.xsd for meta namespace. core-2.0.xsd for elements and the core structure.

This allows for different levels of validation to be performed.

This makes it possible to bring stricter constraints to bear on the same data. But it does not mean the previous level is incompatible with the next.For example the data that satisfies EWN-LMF-2.0.xsd is a subset of data validated by WN-LMF-2.0.xsd (or WN-LMF-2.0 is a superset of EWN-LMF-2.0).

Another use is different IDREF validation depending on whether you are attempting at validating merged files or not.

id types

idtypes-2.0.xsd and ewn-idtypes-2.0.xsd differ in that the latter imposes extra constraints on the well-formedness of EWN ids.

relaxed id types vs strict

This deals with id reference validation.

(ewn-)idtypes-2.0.xsd and (ewn-)idtypes-2.0-relax_idrefs.xsd differ in that the latter allows some non-local references not to have their target in the same file. This is necessary in the case of part-of-speech cross-references such as the ones found in derivation relations (adj derived from noun, etc...) or maybe other cases (seealso, etc). The target then resides in a different file. This is useful to validate pre-merging lexicographer files while the strict mode must be used to validate the merged file, to make sure references are not left dangling.

some resulting combinations:

WN-LMF-2.0-relax_idrefs.xsd WN-LMF-2.0.xsd EWN-LMF-2.0-relax_idrefs.xsd EWN-LMF-2.0.xsd

migration

A migration tool (to2.0.xsl) is provided in the form of an XSLT 1.0 transform. It does not change the structure nor the data. Only attributes are transformed to satisfy the new naming and namespaces.

EWN compatibility with 2.0 schema

The transformed lexicographer files satisfy both:

The transformed merged file satisfies both:

Validation tool

Preferred validation tool (based on Saxon, fast and efficient) Basic validation tool (based on standard validation tools that come with Java8, may be slow)

arademaker commented 4 years ago

why saving versions into directories instead of using the releases of GitHub?

arademaker commented 4 years ago

Hi @1313ou can you explain your motivations and changes? What is different between version 1.1 and 2.0? Why keep both in the repository?

1313ou commented 4 years ago

Better have a look here

I don't want to overwrite things. The DTD is OK, just very weak and out of style. Better be inspired by FrameNet.

arademaker commented 4 years ago

What is the XEWN? Sorry, over the years, may abbreviations are introduced for different versions of wordnets.

Sent from my iPhone

1313ou commented 4 years ago

It's a fork of EWN, is documented here: https://github.com/1313ou/english-wordnet, is due to move here: https://github.com/x-englishwordnet/ and lives off EWN's shortcomings.

jmccrae commented 4 years ago

I am going to close this PR as it does not fix any reported issues. Please report issues first.