eiffel-community / eiffel

The Eiffel framework vocabulary, descriptions, guides and schemas along with links to relevant implementation repositories.
Apache License 2.0
122 stars 59 forks source link

Generate docs and schemas from common schema definition format #315

Closed magnusbaeck closed 2 years ago

magnusbaeck commented 2 years ago

Applicable Issues

Fixes #282

Description of the Change

A new schema definition format that's a superset of the normal JSON schema format is introduced and all existing schemas and documentation files are merged into files in this new file format, stored in the "definitions" directory.

At the same time, the definitions of the meta member and event links are extracted to their own definition files and referenced from the event files. This both allows reuse but also makes it trivial for scripts like SDK generators to understand the structure better. Since both meta and links have evolved over time we've reconstructed different versions of those subschemas.

The schema files aren't affected at all by these changes, i.e. running generate_schemas.py on a definition file will produce a byte-identical schema file. This isn't quite true for the .md documentation; some minor changes are introduced:

To make sure the definition files don't diverge from the generated files we have an additional job in the GitHub Actions workflow that's run for all pushes and PRs. It runs the generation scripts and verifies that we won't get any dirty files in the workspace and no untracked files. See https://github.com/magnusbaeck/eiffel/actions/runs/3007678382 and https://github.com/magnusbaeck/eiffel/actions/runs/3007462169 for examples of what violations look like.

Alternate Designs

A few other options were discussed in the associated issue.

Benefits

The current representation of event schemas and documentation is an obstacle for protocol maintainers and others who want to process the information with a program. Making this change would greatly improve the situation.

Possible Drawbacks

None, I think.

Sign-off

Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I have the right to submit it under the open source license indicated in the file; or

(b) The contribution is based upon previous work that, to the best of my knowledge, is covered under an appropriate open source license and I have the right under that license to submit that work with modifications, whether created in whole or in part by me, under the same open source license (unless I am permitted to submit under a different license), as indicated in the file; or

(c) The contribution was provided directly to me by some other person who certified (a), (b) or (c) and I have not modified it.

(d) I understand and agree that this project and the contribution are public and that a record of the contribution (including all personal information I submit with it, including my sign-off) is maintained indefinitely and may be redistributed consistent with this project or the open source license(s) involved.

Signed-off-by: Magnus Bäck \magnus.back@axis.com

e-backmark-ericsson commented 2 years ago

Sorry for raising this concern now, I'm either forgetting some good argumentation we've had or I've understood the consequences of this change too late.

With this solution we still need to create new files for a new version of each event. I do believe the reason for having individual json schemas for all historical event version stored in the repo was to make it easy to reference those schemas for all historical versions through one single branch of the repo. Now that we move from having the json schemas as the source of truth, and instead generate them for each new release (or each new version of the events?), do we still need this or could we let the yaml files be the same and change them for each new version instead? It would make reviewing of protocol changes so much easier if we don't create a new file for each new version.

magnusbaeck commented 2 years ago

It seems like orthogonal questions:

Schema files need to be regenerated and committed along with schema the definition file so this PR doesn't change the structure of the repo and how it's consumed.

e-backmark-ericsson commented 2 years ago

Well the first question is what this PR and its related issue is all about, so that is of no concern I believe. The second question has a follow up question hidden in it - how do we make sure the generated json files are aligned with the commited yaml files? If we don't at the same time introduce a new structure, for example by not checking in those json schemas but instead generate them at merge of a PR, we instead need to add a sanity test to verify that the manually generated json schemas match the commited yaml files. It won't be sustainable to review such differences manually for each change to the protocol. And I believe that the individual json schemas per version could still be generated even if we don't create new yaml files for each version, by reading old commits from the repo, or?

e-backmark-ericsson commented 2 years ago

As long as the validation of the generated json files match the yaml files is part of the Github action checking any protocol PRs, we don't need to introduce any more improvements through this PR. In the long run we might want to generate an external documentation site for our events, and then we will have a lot of new requirements to deal with there.

z-sztrom commented 2 years ago

Works fine for me. Both schema and documentation generated properly.

z-sztrom commented 2 years ago

I cannot review the changes. No enough permissions?

magnusbaeck commented 2 years ago

Weird, you should be able to review the PR (but since you're not a maintainer any approval you make won't count towards any minimum number of approvals).

magnusbaeck commented 2 years ago

One thing I don't think we've discussed is what to do about changes to the YAML files that don't affect the generated schemas, like documentation changes and changes in the links. For example, #287 adds a new link type to a few events types which in that PR amounts to a mere documentation change, but when merging that change into this PR I can only assume that we need to issue new event versions (with a minor version bump)? Those schemas would currently be identical to the preceding versions.

magnusbaeck commented 2 years ago

No, wait. When merging the PRECURSOR branch (or master, after that PR is merged) into this PR branch the current events should be regarded as settled law and we obviously shouldn't issue new event versions after the fact, but what should we do the next time someone adds a link type or makes a similar non-trivial change to a schema definition file that doesn't affect the schema?

magnusbaeck commented 2 years ago

This PR is now up to date with @e-backmark-ericsson's recently merged PRECURSOR change so it's ready for a review. Let me know if there would be any point in a "group review" where we go through things together.

e-backmark-ericsson commented 2 years ago

No, wait. When merging the PRECURSOR branch (or master, after that PR is merged) into this PR branch the current events should be regarded as settled law and we obviously shouldn't issue new event versions after the fact, but what should we do the next time someone adds a link type or makes a similar non-trivial change to a schema definition file that doesn't affect the schema?

I'd say that the minor version of the event should be stepped if there is a link-only update to it, even though we (currently) don't check such things in the schemas. The reason for not having that as part of the schemas is that when the schemas were originally created there were no known means to validate that a list of dictionaries actually had certain keys in their dictionaries. I believe that more recent json schema version can handle that, and as soon as we add such restrictions to the schemas they will need to be updated also for link-only updates in the events.

magnusbaeck commented 2 years ago

I'd say that the minor version of the event should be stepped if there is a link-only update to it, even though we (currently) don't check such things in the schemas.

Yeah, I agree. I'll update eiffel-syntax-and-usage/versioning.md to document this.

Regarding the comments on the copyright headers - I believe the correct way to handle those is to actually move the copyright headers out of the generated md files and instead provide them as comments in the yml files. There should really bee no need to make them part of the generated md files by adding them as a key with underscore to the yml file structure. The actual copyright should hold for the file that is manually edited (the yml file) and not for the generated files.

Yep. There was no good reason for omitting the copyright notice from the YAML files. We could generate a file header for the .md files that states the origin of the file, something like this:

<!--
This file was generated from ../definitions/EiffelCompositionDefinedEvent/3.2.0.yml.
See that file for a copyright notice.
-->
magnusbaeck commented 2 years ago

Yeah, I agree. I'll update eiffel-syntax-and-usage/versioning.md to document this.

Done in 0fef4211126e7e628bc1fc6f50359b5a5a156fd3.

Yep. There was no good reason for omitting the copyright notice from the YAML files. We could generate a file header for the .md files that states the origin of the file, something like this:

Done in 28abb25871ead9c6da4e36b9bacef615be29134d.

magnusbaeck commented 2 years ago

@z-sztrom, this PR has been merged now so feel free to rebase and upload your schema URL PR whenever you have time.