glossarist / concept-model

Glossarist Concept model
1 stars 0 forks source link

Document Glossarist YAML v2 format using YAML Schema #27

Open ronaldtse opened 1 year ago

ronaldtse commented 1 year ago

From https://github.com/glossarist/glossarist-ruby/issues/76#issuecomment-1670487906

We need to document the Glossarist YAML v2 format in this repository using YAML/JSON Schema.

HassanAkbar commented 1 year ago

@ronaldtse I am not sure about the structure of glossarist v2 format. To better understand what it is I was looking at metaversestandards-glossary and I have a few questions related to it.

ronaldtse commented 1 year ago

From what I understand in V2 we removed the localizations from the concept files and moved them to their respective files [...]

Removed from the files and moved them to "their respective files"? What did you mean?

do we need to support camel case or snake case or both in glossarist?

I actually prefer snake case for YAML keys. @ribose-jeffreylau @strogonoff are you okay with this?

HassanAkbar commented 1 year ago

Removed from the files and moved them to "their respective files"? What did you mean?

@ronaldtse I mean like in the example below, everything under the data is from current glossarist model, just moved to a separate localized-concept file with a uuid and in the concept file we are adding a reference to this file.

id: 00250c70-121c-40d3-8230-8b87380dd1ae
data:
  language_code: eng
  terms:
    - normative_status: preferred
      type: expression
      designation: time
  definition:
    - content: monotonically increasing value generated by a node
  notes: []
  examples: []
  authoritativeSource:
    - link: >-
        https://www.web3d.org/specifications/X3Dv4Draft/ISO-IEC19775-1v4-IS.proof/Part01/glossary.html#Time
status: valid
dateAccepted: 2023-08-04T11:33:09.535Z
ronaldtse commented 1 year ago

In v2, every localized concept should be in a separate file, not a single file that contains multiple localized concepts.

ribose-jeffreylau commented 1 year ago

@ronaldtse To me, normal YAML keys are snake_case.

HassanAkbar commented 1 year ago

In v2, every localized concept should be in a separate file

@ronaldtse And the separate file structure will be almost similar to v1 model ?

strogonoff commented 1 year ago

I am a bit alarmed by this discussion.

If we are describing the schema how it effectively is right now, we ought to describe the schema how it effectively is right now. We can then take that as version 1.0 or 0.1 and evolve from there: implement schema version support in consumers and evolve data structures.

If we are not describing the schema how it is but designing the new schema here, we will 1) waste time on consensus and 2) end up with a schema that doesn’t match the data, so then someone will have to make sure all data sources are updated to the new schema, all implementations are updated to the new schema, etc., so depending on other ongoing projects it may easily be months before we can begin establishing some sort of cadence.

strogonoff commented 1 year ago

If previous comment was ambiguous: let’s describe the schema how it effectively is now and not debate what it should be, that process is potentially infinite and should take place in context of schema versioning.

ronaldtse commented 1 year ago

@strogonoff: @HassanAkbar and I are describing how this structure has been implemented in the latest instance of ISO 10303-2, of which implementation is already integrated into Metanorma. We are attempting to reach consensus here with the Glossarist implementation.

ronaldtse commented 1 year ago

@strogonoff please note that the Glossarist YAML format is already used in Geolexica and Metanorma, today.

strogonoff commented 1 year ago

@ronaldtse Exactly. Let’s document what it is in its current state, i.e. snake or camel case as they are used now. Then we can work on making it better version by version.

ronaldtse commented 1 year ago

To be fair, glossarist-ruby is already somewhat flexible in what YAML schema it reads -- it supports the old Geolexica format, and also supports the newer ISO 10303-2 format, and the new format used in the Metaverse Glossary. So this is why there is some confusion on what is the "proper" YAML format.

@HassanAkbar can you please come up with the YAML Schema for the ISO 10303-2 format and then we can discuss it in detail? Please put that in a PR so we can all comment by line... thanks.

HassanAkbar commented 1 year ago

@ronaldtse By skimming through the data sources I could not find new version in ISO 10303-2 e.g concept-3.1.1.1. I did find that isotc211-glossary is using the new glossarist format e.g 0002e0ac-f74e-5ae0-9b58-f459c7d60cfa.

I’m currently going through the documents in details and will updated you on it once I am done.

strogonoff commented 1 year ago

To be fair, glossarist-ruby is already somewhat flexible in what YAML schema it reads -- it supports the old Geolexica format, and also supports the newer ISO 10303-2 format, and the new format used in the Metaverse Glossary. So this is why there is some confusion on what is the "proper" YAML format.

This doesn’t look like an obstacle if we are documenting the schema as it is currently used. The schema would simply document those fuzzy instances as they are now in data. We can subsequently mark any undesired duplications as deprecated and clarify semantics as we iterate.

strogonoff commented 1 year ago

One last note, I would advise against YAML schema, which seems to be building on top of JSON schema 4 (since then JSON schema version 5, 6, 7, 2019 and 2020 have come out), introducing specific incompatibilities with JSON schema (e.g., propertyOrder, which seems to have been dropped out from discussions on JSON schema vocabulary), and published ad-hoc rather than being backed by an Internet Draft for example (though I may be wrong here).

JSON schema is compatible with YAML as is, not only because JSON is a subset of YAML but also because validators operate on runtime representations of data anyway. In this sense, JSON schema is a bit of a misnomer, since validation typically takes place against runtime JavaScript objects (or Python dictionaries, etc.), and how they are obtained (whether from YAML or JSON) is orthogonal.

Other than that, I don’t have many opinions. As mentioned on Zulip, an update to how Glossarist data is represented (ditching universal concepts and some other changes) is likely coming in Glossarist format v3 (the current version with universal and localized concepts is v2), but I don’t see why we shouldn’t document the schema as is while the next version is being fleshed out.

ronaldtse commented 1 year ago

YAML Schema is very well adopted in industry, e.g.

This is the very first time I've heard of Glossarist format v3, so please help explain what are the intended changes before we move ahead with that.