Open augean opened 5 months ago
7.0 has a machine-readable schema.
The line syntax is defined in the spec using ABNF, which is automatically extracted as grammar.abnf. At least one of the public gedcom parsers uses this grammar to parse lines.
The structure hierarchy is defined in the spec using a machine-readable variant of the metasyntax created for 5.0, which is automatically extracted as grammar.gedstruct. The structure hierarchy is also converted to a different machine-readable form as a set of YAML files hosted in several places including the URI of each structure type (e.g. https://gedcom.io/terms/v7/ABBR) and in a separate repository of both standard and extension structures (GEDCOM-registries). Multiple public gedcom parsers and development aids use one or both of these to parse and validate structure hierarchies.
These machine-parseable formats are not perfect (for example, we lack a machine-parseable way of marking something as deprecated) and we'd welcome suggestions in how to improve them. I did not look at your attached files closely enough to know if you have features the standard currently lacks.
Discussion in GEDCOM Steering Committee 1/18/2024: We have machine readable schema. We have machine readable positive test cases in the GEDCOM.io repository. We currently don't have machine readable negative test cases, such as appear in PrimSection.txt Would others find that useful to have somewhere?
Closing since original question has been answered, and follow-up discussion can be done in https://github.com/FamilySearch/GEDCOM/discussions/422
1)All the comments are stripped out of the machine-readable schema The comments are VERY important to keep in, the schema is very difficult to use without comments (Please see ged.5.1.1.txt where I maintained the comments in the machine-readable form)
2)There is no machine-readable file with regular expressions, and comments defining the primitive types please see my PrimSection.txt , where I have the primitive types, along with descriptions and regular expressions (and examples !!)
3)The spec is fragmented across too many different files, making it very complex to parse (Please see attached, where I just used 2 files)
citing the above 3 reasons, I think the schema is not fully machine-readable -very important information like comments are left out of the machine-readable version -the regular expressions, which are critical are left out of any machine-readable version -the spec is fragmented across too many files.
Please review the attached ged.5.1.1.txt and PrimSection.txt which shows how the above issues could be fixed, and allow us to have a fully machine-readable GEDCOM 7 spec
also, please advise, is it possible to reopen the issue? I don't want to make a nuisance of myself, but I think the underlying issues are not resolved (see above) At present issues are closed without any input from me, who originally logged the issue Github doesn't allow me to reopen Thanks !!!
abnf
and gedstruct
and HTML pre elements with class="sourceCode abnf"
and class="sourceCode gedstruct"
, respectively.cat extracted_files/tags/* > all.yaml
.cat specification/gedcom-*md > specification.md
; character-level syntax is delimited by blocks that start "```abnf" and end "```" and structure-level metasytnax is is delimited by blocks that start "```gedstruct" and end "```"We closed the issue because everything you asked for (machine-readability) is already provided. I still believe that's the case, but you've asked for more things (regular expressions and comments) so I'll re-open it for now to see if further conversation prompts identifying an issue that we should resolve.
thanks for the feedback, I will take a further look But comments are very important, as they are used in genealogy tools, which are built off machine-readable schemas I just think that we should maintain the comments in the machine-readable version,
for example: in the Augean tool, I use comments extensively when editing GEDCOM
The YAML files work fine, thanks, I was able to parse all YAML files So please ignore my comment about too many files,
so, two issues would be
Discussion 1/25/2024: We believe there are three separate issues worth discussing/pursuing here:
We don't think we need "regular expressions" per se because they can be derived from ABNF and because there are multiple different regex syntaxes used by various tools and libraries, so even if we picked one style, others would have to convert them anyway.
Please let us know if we are missing anything or if you have other feedback.
User descriptions in YAML files will help a lot - thanks !!! Regular expressions in each YAML file would be the icing on the cake, but are not essential listing the files that are supposed to be machine-readable, will help as I was originally confused by this -thanks !!!
XML and JSON schemas are usually machine-readable Requesting the same for GEDCOM 7
Please see the attached files, for an example of a machine-readable schema that I created for GEDCOM 5.1.1 This allows me to create new GEDCOM files easily, and is very easy for tools to interact with
Please could we have a machine-readable GEDCOM 7 schema (please use the attached files as an example) I single file (or even multiple files), which allows us to easily parse the GEDCOM
ged.5.1.1.txt PrimSection.txt structure
thanks