Open ronaldtse opened 1 year ago
@ronaldtse I am not sure about the structure of glossarist v2 format. To better understand what it is I was looking at metaversestandards-glossary and I have a few questions related to it.
From what I understand in V2 we removed the localizations from the concept files and moved them to their respective files and assigned an ID
, other than that the keys and structure is almost same. Is this correct or are there some structural changes as well that I have missed?
I noticed that the keys are in camel casing e.g concept/01fa30d3-4e4b-4142-b68a-c299b55b3fb8
id: 01fa30d3-4e4b-4142-b68a-c299b55b3fb8
data:
identifier: '88'
localizedConcepts:
eng: 2b959537-c600-41d1-aeb7-7233c35d30eb
status: valid
dateAccepted: 2023-08-04T11:33:09.535Z
do we need to support camel case
or snake case
or both in glossarist?
From what I understand in V2 we removed the localizations from the concept files and moved them to their respective files [...]
Removed from the files and moved them to "their respective files"? What did you mean?
do we need to support camel case or snake case or both in glossarist?
I actually prefer snake case for YAML keys. @ribose-jeffreylau @strogonoff are you okay with this?
Removed from the files and moved them to "their respective files"? What did you mean?
@ronaldtse I mean like in the example below, everything under the data
is from current glossarist model, just moved to a separate localized-concept
file with a uuid
and in the concept
file we are adding a reference to this file.
id: 00250c70-121c-40d3-8230-8b87380dd1ae
data:
language_code: eng
terms:
- normative_status: preferred
type: expression
designation: time
definition:
- content: monotonically increasing value generated by a node
notes: []
examples: []
authoritativeSource:
- link: >-
https://www.web3d.org/specifications/X3Dv4Draft/ISO-IEC19775-1v4-IS.proof/Part01/glossary.html#Time
status: valid
dateAccepted: 2023-08-04T11:33:09.535Z
In v2, every localized concept should be in a separate file, not a single file that contains multiple localized concepts.
@ronaldtse To me, normal YAML keys are snake_case
.
In v2, every localized concept should be in a separate file
@ronaldtse And the separate file structure will be almost similar to v1 model ?
I am a bit alarmed by this discussion.
If we are describing the schema how it effectively is right now, we ought to describe the schema how it effectively is right now. We can then take that as version 1.0 or 0.1 and evolve from there: implement schema version support in consumers and evolve data structures.
If we are not describing the schema how it is but designing the new schema here, we will 1) waste time on consensus and 2) end up with a schema that doesn’t match the data, so then someone will have to make sure all data sources are updated to the new schema, all implementations are updated to the new schema, etc., so depending on other ongoing projects it may easily be months before we can begin establishing some sort of cadence.
If previous comment was ambiguous: let’s describe the schema how it effectively is now and not debate what it should be, that process is potentially infinite and should take place in context of schema versioning.
@strogonoff: @HassanAkbar and I are describing how this structure has been implemented in the latest instance of ISO 10303-2, of which implementation is already integrated into Metanorma. We are attempting to reach consensus here with the Glossarist implementation.
@strogonoff please note that the Glossarist YAML format is already used in Geolexica and Metanorma, today.
@ronaldtse Exactly. Let’s document what it is in its current state, i.e. snake or camel case as they are used now. Then we can work on making it better version by version.
To be fair, glossarist-ruby is already somewhat flexible in what YAML schema it reads -- it supports the old Geolexica format, and also supports the newer ISO 10303-2 format, and the new format used in the Metaverse Glossary. So this is why there is some confusion on what is the "proper" YAML format.
@HassanAkbar can you please come up with the YAML Schema for the ISO 10303-2 format and then we can discuss it in detail? Please put that in a PR so we can all comment by line... thanks.
@ronaldtse By skimming through the data sources I could not find new version in ISO 10303-2 e.g concept-3.1.1.1. I did find that isotc211-glossary is using the new glossarist format e.g 0002e0ac-f74e-5ae0-9b58-f459c7d60cfa.
I’m currently going through the documents in details and will updated you on it once I am done.
To be fair, glossarist-ruby is already somewhat flexible in what YAML schema it reads -- it supports the old Geolexica format, and also supports the newer ISO 10303-2 format, and the new format used in the Metaverse Glossary. So this is why there is some confusion on what is the "proper" YAML format.
This doesn’t look like an obstacle if we are documenting the schema as it is currently used. The schema would simply document those fuzzy instances as they are now in data. We can subsequently mark any undesired duplications as deprecated and clarify semantics as we iterate.
One last note, I would advise against YAML schema, which seems to be building on top of JSON schema 4 (since then JSON schema version 5, 6, 7, 2019 and 2020 have come out), introducing specific incompatibilities with JSON schema (e.g., propertyOrder
, which seems to have been dropped out from discussions on JSON schema vocabulary), and published ad-hoc rather than being backed by an Internet Draft for example (though I may be wrong here).
JSON schema is compatible with YAML as is, not only because JSON is a subset of YAML but also because validators operate on runtime representations of data anyway. In this sense, JSON schema is a bit of a misnomer, since validation typically takes place against runtime JavaScript objects (or Python dictionaries, etc.), and how they are obtained (whether from YAML or JSON) is orthogonal.
Other than that, I don’t have many opinions. As mentioned on Zulip, an update to how Glossarist data is represented (ditching universal concepts and some other changes) is likely coming in Glossarist format v3 (the current version with universal and localized concepts is v2), but I don’t see why we shouldn’t document the schema as is while the next version is being fleshed out.
YAML Schema is very well adopted in industry, e.g.
This is the very first time I've heard of Glossarist format v3, so please help explain what are the intended changes before we move ahead with that.
From https://github.com/glossarist/glossarist-ruby/issues/76#issuecomment-1670487906
We need to document the Glossarist YAML v2 format in this repository using YAML/JSON Schema.