exercism / problem-specifications

Shared metadata for exercism exercises.
MIT License
326 stars 541 forks source link

Should we use $Schema in config.json? #1972

Open wolf99 opened 2 years ago

wolf99 commented 2 years ago

Pretty much as the title says, should we add usage of $Schema to the track config.js and exercise .meta/config.js to reference the related schema?

SaschaMann commented 2 years ago

What would that do?

wolf99 commented 2 years ago

Aha, I should have mentioned that of course.

For maintainers, it means IDEs pick up the schema automatically and checks the file against it. For tooling, it could make validation easier - but I guess that's working fine as-is so far?

ynfle commented 2 years ago

IDEs (like VSCode) can also offer suggestions on keys and what type of data the keys expect

ErikSchierboom commented 2 years ago

I sort of like the idea, but I have two concerns:

  1. We would be duplicating the logic that's encoded in configlet. This could lead to configlet disapproving while the schema thinks the file is valid, and vice versa.
  2. I wonder how flexible JSON schema is, as not all rules are that straightforward: https://exercism.org/docs/building/configlet/lint#h-rules
wolf99 commented 2 years ago

For your points @ErikSchierboom :

  1. The benefit to Configlet would be that the validation logic could become much more generic. Configlet would only need to validate the data against the schema and be done (does Nim do this easily though?) Then changes to the effective schema require only changes to this actual schema schema rather than rejigging Configlet.
  2. I'm drawing up a quick example of what such a schema might look like. Doing a bit of research at the moment to get back into JSON schemas and the ones already in use within Exercism. I should have something after that and will know then if the rules can be covered.

Registering such schemata in https://github.com/schemastore/schemastore/ is one of the ways of having hem made available to IDE's. For example, see the GitHub workflow one added by GitHub: https://github.com/SchemaStore/schemastore/blob/master/src/schemas/json/github-workflow.json .

ErikSchierboom commented 2 years ago

The benefit to Configlet would be that the validation logic could become much more generic. Configlet would only need to validate the data against the schema and be done (does Nim do this easily though?)

I briefly checked support for JSON schema in Nim, and I don't think there's anything great. IIRC, I found one library that could do JSON schema validation but its error messages were quite unwieldy.

Then changes to the effective schema require only changes to this actual schema schema rather than rejigging Configlet.

True. As we wouldn't want to download the schema each time we do linting, we'd need to cache it.

My main concern with using JSON schema is that we'd some of the flexibility that we have now, where there can be complex rules with very specific and detailed descriptions. We also already have most of the rules implemented, so we'd not benefit from JSON schema much in that it would allow us to build something quicker.

@ee7 what do you think?

wolf99 commented 2 years ago

I will try to assess the Nim support a bit more then. As I mention here https://github.com/exercism/problem-specifications/issues/2020#issuecomment-1106734015, the benefit will be reduced if the tool cannot use the schema.

Another question, I notice in the code comments for configlet that there is mention of avoiding the PCRE Regex lib, do you know the reason behind this? It may impact how useful a schema could be too. Perhaps @SaschaMann or @ee7 would know either?

SaschaMann commented 2 years ago

Perhaps @SaschaMann or @ee7 would know either?

I know practically nothing about Nim, configlet, JSON schemas, and regex, sorry :D

ee7 commented 2 years ago

My take: I think it's fine to add a schema, to help people catch basic errors while editing. But I don't think configlet lint should use it, and people should be aware that configlet lint can error for something that your IDE says is OK.

I'm not a JSON schema expert, but I'd guess that we have rules (and theoretically could have new rules in the future) that are either awkward or impossible to express in JSON schema.

Can a schema express this?

The exercises.concept[].prerequisites value must be a non-empty array of strings if exercises.concept[].status is not equal to "deprecated", except for exactly one exercise which is allowed to have an empty array as its value

There will also be checks that require a network connection. For example:

The forked_from values must refer to actually implemented exercises

Here, to determine validity, you need an up to date list of every concept exercise on every track. Another one: checking that a UUID is unique across all of Exercism. (configlet lint doesn't actually check these yet, but it will eventually).

And if there's a single rule that a schema cannot check, then there's a difference between configlet lint and checking via schema alone. So configlet lint needs to do JSON linting without using the schema, and configlet lint needs to parse the JSON anyway, at which making it use the schema is questionable even if there was good Nim support for it (and there isn't, as far as I know).

configlet fmt and configlet sync also need to parse the JSON.

there is mention of avoiding the PCRE Regex lib, do you know the reason behind this?

We want configlet to be a zero-dependency binary, as far as possible. If we use the Nim stdlib re library, then the user needs to have PCRE installed. And I don't want to static link PCRE, since that would add a lot of binary bloat.

There is a regex package written in pure Nim, which would avoid the run-time dependency. But it's rarer to reach for regular expressions in Nim than in other languages anyway - scanf and scanTuple are usually nicer (see https://nim-lang.org/docs/strscans.html), and faster too.