Open dthaler opened 10 months ago
@tychonievich comments?
I don't understand what problem you are trying to solve nor what problem you are finding with the current system. Maybe you are trying to create YAML files to help parse undocumented extensions? But I don't see how this proposed change actually helps solve the hard problem there, i.e. that same extTag is used to mean different things by different applications.
I added extension tags
to the YAML with the thought that it would serve as a hint when picking tags for URI-identified structures during serialization, hopefully (a) increasing the human readability of the resulting files and (b) increasing the chance that incomplete implementations (ones that don't parse the schema) might treat the data correctly. I'm fairly confident that those are not the use cases you are referring to in this issue. I also realize that those purposes are not mention in the format definition and probably should be.
I don't understand what problem you are trying to solve nor what problem you are finding with the current system. Maybe you are trying to create YAML files to help parse undocumented extensions?
Yes, that's one.
But I don't see how this proposed change actually helps solve the hard problem there, i.e. that same extTag is used to mean different things by different applications.
It solves it by using subsumes with application-specific URIs that can be derived from existing GEDCOM files without changes.
I added
extension tags
to the YAML with the thought that it would serve as a hint when picking tags for URI-identified structures during serialization,
I follow how extension tags
can be used for anything useful during serialization. If there's only one, then it provides no value over just using tag
as in my suggestion. If there's more than one, I don't follow how one would choose what to use, other than always choosing the first.
hopefully (a) increasing the human readability of the resulting files and (b) increasing the chance that incomplete implementations (ones that don't parse the schema) might treat the data correctly. I'm fairly confident that those are not the use cases you are referring to in this issue. I also realize that those purposes are not mention in the format definition and probably should be.
I can neither agree nor disagree, since I don't yet understand the use you suggest.
After some reflection and playing with some examples,
I agree that having a single tag
and using subsumes
could cover most use cases I had in mind when adding extension tags
; for example _AKA
and _AKAN
could be given separate URIs and marked as subsuming one another instead of being stored in a single YAML.
A unified tag
can't provide backup tags when merging files that use extension structures that conflict, like _TODO
(see GEDCOM-L's list for how RootsMagic and WebTrees use that tag for different structures in the same context). That said, I'm not sure how valuable that use case is; picking a random tag or appending something to the expected tag might be enough.
When I added extension tags
I had in mind possibly supporting some of the illegal standard tags for extension structures that various applications have added, like The Master Genealogist's ENMPL
. But I never explored that further, and haven't thought through the pros and cons of allowing that in any detail, in part because I decided not to pursue YAML files for undocumented extensions. However, as you now are proposing those YAML files I'm thinking about that again and note that a separate key would be needed for this use case. Whether it should be extension tags
or a new illegal standard tags
or the like I don't have a strong opinion about.
A unified tag
might accidentally imply that applications can assume that an undocumented extension tag translates to the given URI. I worry it might encourage applications to omit the schema because they think the tag is enough.
Unifying tag
is conflating two semantics. A standard tag
must identify a unique URI in its context, always means that URI in that context, and should be used in serialization. An entry in extension tags
is just a presentation hint, can be changed to avoid tag name collisions, and is insufficient to determine URI by itself.
Currently we document those differences in the key of the YAML entry: standard tag
has one meaning, extension tags
the other. If we merge them we'd need to document those differences in terms of the form of the value of that key: if it starts with _ it has one meaning, if it doesn't it has a different meaning. I'd rather keep the semantic complexity in the keys, simplifying code and matching file structure to semantic intent, but it's a matter of choice.
Am I right in thinking that the proposed change does not enable any new functionality, only change how some situations are presented?
Currently https://gedcom.io/terms/format has:
However,
extTag
is ambiguous. That is, two separate applications might use the sameextTag
with very different meanings, even under the same superstructure. As such, simply listing theextTag
underextension tags
can cause tools that consume the YAML to do the wrong thing with GEDCOM files. A URI on the other hand would be unambiguous. So would the combination ofHEAD.SOUR
payload plusextTag
.I claim that the new
subsumes
key can be used to more accurately represent the intent of the extension tags key, in an unambiguous way. Now that we havesubsumes
, I believeextension tags
provides no real value and I would propose replacingstandard tag
andextension tags
with justtag
(which could be a standard tag or anextTag
) and the existingsubsumes
.To resolve ambiguity of different applications using the same
extTag
, a URI is required for use withsubsumes
, even for undocumented extension tags. A proposal to construct such a URI is:SCHMA
HEAD.SOUR
payload is itself a URI as suggested by https://gedcom.io/specifications/FamilySearchGEDCOMv7.html#HEAD-SOUR, construct the extension URI as: HEAD.SOUR payload / extTag