cps-org / cps

Common Package Specification — A cross-tool mechanism for locating software dependencies
https://cps-org.github.io/cps/
Other
91 stars 8 forks source link

Generate JSON schema #63

Closed mwoehlke closed 1 month ago

mwoehlke commented 2 months ago

Add machinery to generate a JSON schema from the documentation. The schema can be used to validate CPS JSON.

As a side benefit to changing how attributes are declared to be more machine-readable, it is also slightly easier to specify common properties of an attribute.

Supersedes #25. Unlike that proposal, this retains reST (with only modest changes aside from all descriptions gaining a level of indent) as the source of truth, so does not introduce new errors and retains superior editability. It should also be somewhat more robust, as there is less human-written textual duplication.

mwoehlke commented 2 months ago

Note: I think the generated schema is okay, and Python jsonschema seems happy enough with it, but feedback from anyone more familiar with JSON schemas and/or using them for validation would be appreciated.

autoantwort commented 2 months ago

Could you add the default version to the schema and ideally the example values?

autoantwort commented 2 months ago

Currently the description of the platform attribute is duplicated and as a result the longer description is ignored.

The format of the website should be "format": "uri"

version_schema should use enum.

Nitpick, but can you arrange the properties in a meaningful order?

Can you commit the schema?

mwoehlke commented 1 month ago

Could you add the default version to the schema and ideally the example values?

@autoantwort, thanks for the feedback! (Also, sorry about the delay in replying.) I'm not following you here, though; are you talking about cps_version? I assume by "default value" you mean I should copy in the version from conf.py? I'm not very familiar with JSON schema, so it would help if you could express this as what you want to see changed in the resulting schema.

That said, I may punt on this (and other changes) in the interest of making progress. It will be beneficial to have something even if it's not perfect, but we can and should continue to make refinements.

Can you arrange the properties in a meaningful order?

What is "meaningful"? Right now they are in the order "core", "supplemental", with each of those in lexicographical order. It should be straight-forward to remove the core/supplemental split and have purely lexicographical order, but anything else is difficult to specify and maintain.

Anyway, the intent is that the schema is not the canonical source of truth, but a useful tool mostly targeting automated uses.

The format of the website should be "format": "uri"

Do you mean this?

--- a/cps.schema.json
+++ b/cps.schema.json
@@ -488,6 +488,7 @@
             },
             "website@0": {
                 "description": "Specifies the URI at which the package's website may be found.",
+                "format": "uri",
                 "$ref": "#/definitions/types/string"
             }
         }

(Note: I ran the generated schema through python -m json.tool to get the above. I'm on the fence whether the canonical version should be pretty-printed. It's a trivial code change, but also makes the file much larger for debatable benefit. It's easy enough to run the generated version through a pretty-printer.)

Can you commit the schema?

I'd rather not; checking in generated files is generally undesirable, and I don't see how it's especially helpful when, once this lands, the schema will be available via the published pages. I'd really prefer to avoid the added workflow pain of trying to keep a generated-but-checked-in file up to date.

If you just want to know what it looks like without having to generate it yourself, see https://github.com/cps-org/cps/files/15203794/cps.schema.json. Once/if this lands, it'll be accessible via the published Pages (see also reply to Bret, below).

It would be handy to publish the schema somewhere

Once this lands, it should be accessible from the same location as the rest of the published Pages. Adding a link should be done, but that can be a follow-up.

Eventually, it would be nice to see how the schema differs between the base and head commits of a PR to confirm that we're matching roughly semver expectations with respect to the CPS version number

That's... technically do-able. Generating a diff from GHA should be easy if we can use the published copy as the comparison target. (Otherwise we need a second build, which is somewhat annoying.) Presenting said diff in a useful way might be challenging. (I agree we shouldn't do that in this PR, especially as I'd prefer to use the published copy as the "original", which means generation has to land before comparison can be tested.)

autoantwort commented 1 month ago

I assume by "default value" you mean I should copy in the version from conf.py?

Sorry that was unclear. I mean the default value of a field. For example here (link_languages) the default value is C, but this is not expressed in the schema.

What is "meaningful"? The intent is that the schema is not the canonical source of truth, but a useful tool mostly targeting automated uses.

I still want/prefer the generated documentation (in addition) to the current form. But I can add this in a following PR.

Do you mean this?

Yes

I'm on the fence whether the canonical version should be pretty-printed.

Imo yes. It is a benefit if you can directly read the file if you have to (Ideally you never have, but...).

the schema will be available via the published pages.

This is also ok. I only want a way so that I can provide the $schema url and add the url to https://github.com/SchemaStore/schemastore/blob/master/src/api/json/catalog.json. vcpkg for example has done this by committing the file: https://github.com/SchemaStore/schemastore/blob/master/src/api/json/catalog.json#L1260

Presenting said diff in a useful way might be challenging.

If you commit the schema you don't have the problem. And I guess you have to run the generator anyway to get the schema for the website. So adding a git diff --exit-code to the github action workflow is enough to let the check fail if someone changes the schema but not committed the schema.

mwoehlke commented 1 month ago

Okay, since Bret already :+1:'d this, I'm going to merge it as-is (with one tiny tweak; see below) so follow-ups are easier to review.

@autoantwort, I may not be able to get format working right away. I'm definitely not going to tackle publishing diffs to PRs immediately. Please be encouraged to open Issues for improvements you'd like to see; thanks!

I only want a way so that I can provide the $schema url [...]

Makes sense. The generated schema already has an $id... which is supposed to be correct already (except the file doesn't exist there, yet), but it looks like I set it up to put the right $id in and then copied from autoantwort's PR and forgot to change it to the right location. I've done that now, so it should be correct once this lands.

Presenting said diff in a useful way might be challenging.

If you commit the schema you don't have the problem.

Perhaps, but I still don't think it's worth making it more complicated to make changes.

I still want/prefer the generated documentation (in addition) to the current form. But I can add this in a following PR.

Sure; in fact, I had this in the back of my head as a follow-up already. However, please make it part of make html if possible. At minimum, it needs to be reproducible in local builds, not just when deployed via GHA.