abravalheri / validate-pyproject

Validation library for simple check on `pyproject.toml`
https://validate-pyproject.readthedocs.io/
Mozilla Public License 2.0
115 stars 11 forks source link

Support for SchemaStore #110

Closed henryiii closed 7 months ago

henryiii commented 11 months ago

Would it make sense to support SchemaStore? Currently, there's only one subschema (Ruff), though I'm asking to see if Poetry can be pulled out as a subschema (currently part of the core schema for pyproject.toml) and there's a proposed addition for Hatch. I'm hoping to add my schemas there too, for VSCode and such support. It would be nice to have in validate-pyproject too, I think - you wouldn't have to install the various projects, just pull from one place (schemastore). I don't think it can be implemented as a third party plugin very easily, since you don't know what tool sections are provided beforehand (unless it was a regularly updated copy or something).

I don't have a good idea of how it would work, how someone would opt-in to it (I guess it might require getting the schemas from a URL? I'm not that sure how schemastore works or what would be ideal. I know there's also a repo.) Maybe a regular updated copy would not be too bad. Thoughts?

abravalheri commented 11 months ago

Hi @henryiii , in principle, I think this feature is OK.

Something that I was already considering before is to have some form of loading a schema from a local path (opt-in), so this could be a variation of that, e.g.:

validate-pyproject --schema tool_name=./file.schema.json other-tool-name=https://... ...
# or even a custom uri schema-store://tool-name

Something to have in mind is that fastjsonschema is compatible with draft 07, so we have to be careful with external sources that were not explicitly designed to interoperate with validate-pyproject.

Do you have any thoughts on that?

henryiii commented 11 months ago

SchemaStore is designed around draft 07. :)

I've been looking into this a bit more, and there's one fundamental issue we'd need to solve, but I think it's easy. I'll open a separate issue about that #111. (In an online conference for three days, so depends on how interesting/boring the talks are. :) )

henryiii commented 11 months ago

Is this still true:

https://github.com/abravalheri/validate-pyproject/blob/9c5d56cc2002f5a9cae15b1fbdc1e4366ab1b684/src/validate_pyproject/plugins/setuptools.schema.json#L7-L10

I'd like to also submit at least setuptools.schema.json to SchemaStore, but this seems like it needs updating. :)

abravalheri commented 11 months ago

Is this still true:

Oops. Fix in #112.

henryiii commented 11 months ago

Is "format" (https://json-schema.org/understanding-json-schema/reference/string#format) used in fastjsonschema? Or was it just part of some future work?

abravalheri commented 11 months ago

Hi @henryiii , fastjsonschema allows using format. If I am not wrong, there are some default ones, but I wrote a bunch of custom ones in the formats sub-module.

henryiii commented 11 months ago

Ahh! Not sure why I couldn’t find them via grep.

abravalheri commented 11 months ago

Maybe that is relevant in this context:


Side note:

BTW, thank you very much @henryiii for all the interest you have been showing in this project and all the contributions. I really appreciate that.

Should I go ahead and create a new release with the PRs that were already merged, or do you have any pending change that you have in mind?

henryiii commented 11 months ago

Probably fine to make a release. I'm going to work on making a contribution to SchemaStore, and that requires processing setuptools.schema.json quite a bit to make it standards compliant - no format, no $$description, etc. How often do you expect it to change? I'm building a little conversion script, but if changes are pretty rare, we could just make them twice if we need to in the future. In either case, though, it's unlikely to change the existing file unless I find any bugs with it.

I'd also like to try to work on this issue itself (validate-pyproject pulling schemas from SchemaStore), but that's fine for a future release.

abravalheri commented 11 months ago

it standards compliant - no format...

Just one minor comment: format is standard, but what to do with it is pretty much up to the tool. The "standard" (or docs website?) says:

By default, format is just an annotation and does not effect validation.

Optionally, validator implementations can provide a configuration option to enable format to function as an assertion rather than just an annotation. ... A JSON Schema validator will ignore any format type that it does not understand.

So, it should possible to keep it there for the sake of "semantics" (with the understanding that this way they will be more "informational" than "enforcement").

How often do you expect it to change? I'm building a little conversion script, but if changes are pretty rare, we could just make them twice if we need to in the future. In either case, though, it's unlikely to change the existing file unless I find any bugs with it.

It is been a while that it does not change... So I don't think it should change too much. But it will follow the changes in setuptools (there are some stuff like ext_modules that are in the back of my mind).

One option is to add this script to this repository and run it as part of the test suite to ensure compatibility[^1].

In the future, I was considering to have the schema files themselves directly on the setuptools repository to facilitate contributors (other than me) to introduce new configurations, but this is something open for debate.

[^1]: I am happy to add you as a maintainer to this repo if you are interested, BTW. You have always given great contributions.

henryiii commented 11 months ago

I think the problem is SchemaStore doesn't allow custom formats[^1], since it's used by about a dozen different editors, etc., and they aren't familiar with non-standard formats. Also things like https://github.com/abravalheri/validate-pyproject/blob/585a758ea57955894dacb65bb8071fe46158992d/src/validate_pyproject/plugins/setuptools.schema.json#L246-L247 kind of assume the validator knows how to use the format.

I think it's fine, we can just replace format with a ref to a regex in the conversion script.

Sure, I'm fine to be added, as long as you don't expect me to do too much - happy to help a bit. The main benefit, IMO, is I can fix PRs like #109, and I get to put it in the "My Projects" listing https://iscinumpy.gitlab.io. ;)

[^1]: Those are also missing "type": "string". There's a chance adding that would make SchemaStore happy.

abravalheri commented 11 months ago

Sure, I'm fine to be added, as long as you don't expect me to do too much - happy to help a bit.

No expectations 😝

Those are also missing "type": "string". There's a chance adding that would make SchemaStore happy.

I don't remember now, but I remember seeing in the reference docs at that point in time type being treated as string by default... but my memory is fuzzy. Anyway that was my mistake 😅

I think the problem is SchemaStore doesn't allow custom formats

That looks like a bug on SchemaStore... The "spec/website" explicitly say "By default, format is just an annotation and does not effect validation ... A JSON Schema validator will ignore any format type that it does not understand".

But anyway, if problematic, processing it out is a good option.

abravalheri commented 11 months ago

The reason why I decided to go with format in some cases rather than regex was also because some validations where tricky to express as regex (e.g. the PEP 508 is usually expressed in terms of a BNF grammar right?), or too long to be humanly understandable in a JSON format.

In other cases I could not find a good regex definition (like Python UTF-8 identifiers), or it would be better to use a validation function, so if a PEP updates some definition, we are not surprised with changes.

So in the end, I settled with format because it gave me just the right amount of "editing"/"comprehending error message" capabilities.

henryiii commented 11 months ago

Quick question: is this valid?

[tool.setuptools.packages.find]
where = "src" 

This is in one of the samples already in SchemaStore and doesn't pass validation (which would require

[tool.setuptools.packages.find]
where = ["src"] 

)

henryiii commented 11 months ago

Nope:

$ cookiecutter gh:scientific-python/cookie
...
  [8/9] Choose a build backend
    1 - Hatchling                      - Pure Python (recommended)
    2 - Flit-core                      - Pure Python (minimal)
    3 - PDM-backend                    - Pure Python
    4 - Whey                           - Pure Python
    5 - Poetry                         - Pure Python
    6 - Setuptools with pyproject.toml - Pure Python
    7 - Setuptools with setup.py       - Pure Python
    8 - Setuptools and pybind11        - Compiled C++
    9 - Scikit-build-core              - Compiled C++ (recommended)
    10 - Meson-python                  - Compiled C++ (also good)
    11 - Maturin                       - Compiled Rust (recommended)
    Choose from [1/2/3/4/5/6/7/8/9/10/11] (1): 6
...
$ cd package
$ vi pyproject.toml
$ pipx run build
...
ValueError: invalid pyproject.toml config: `tool.setuptools.packages`.
configuration error: `tool.setuptools.packages` must be valid exactly by one definition (0 matches found):

    - type: array
      items:
        type: string
        at least one of the following:
          - {format: 'python-module-name'}
          - {format: 'pep561-stub-name'}
    - type: table
      additional keys: False
      keys:
        'find':
          type: table
          additional keys: False
          keys:
            'where':
              type: array
              items: {type: string}
            'exclude':
              type: array
              items: {type: string}
            'include':
              type: array
              items: {type: string}
            'namespaces': {type: boolean}

ERROR Backend subprocess exited when trying to invoke get_requires_for_build_sdist

So the example in SchemaStore is wrong. :)

abravalheri commented 11 months ago

Setuptools expects to find an array there. I remember it being a suggestion from the discord server.

henryiii commented 11 months ago

FYI, my PR is here: https://github.com/SchemaStore/schemastore/pull/3307

henryiii commented 11 months ago

Seems like SchemaStore's validator is not the only one to not accept unknown formats:

$ validate-pyproject tests/invalid-examples/poetry/poetry-bad-multiline.toml --tools=http://json.schemastore.org/pyproject.json#properties/tool -vv
[INFO] validate_pyproject.api.load_builtin_plugin defines `tool.distutils` schema
[INFO] validate_pyproject.api.load_builtin_plugin defines `tool.setuptools` schema
[INFO] https://json.schemastore.org/cibuildwheel.json defines `tool.cibuildwheel` schema
[INFO] https://json.schemastore.org/ruff.json defines `tool.ruff` schema
[INFO] https://json.schemastore.org/scikit-build.json defines `tool.scikit-build` schema
[INFO] https://json.schemastore.org/poetry.json defines `tool.poetry` schema
[ERROR] JsonSchemaDefinitionException: Unknown format: uint8
...
fastjsonschema.exceptions.JsonSchemaDefinitionException: Unknown format: uint8

😅

PS: I did find the place in SchemaStore where I can ignore unknown formats.

henryiii commented 11 months ago

Okay, this is my current proposal, based on https://github.com/abravalheri/validate-pyproject/issues/110#issuecomment-1753201256:

Stage 1: we add a way to load external files, either via path or URL. I've made a draft implementation:

$ validate-pyproject tests/examples/poetry/poetry-complete.toml -t poetry=http://json.schemastore.org/poetry.json -vv
[INFO] validate_pyproject.api.load_builtin_plugin defines `tool.distutils` schema
[INFO] validate_pyproject.api.load_builtin_plugin defines `tool.setuptools` schema
[INFO] https://json.schemastore.org/poetry.json defines `tool.poetry` schema
Valid file: tests/examples/poetry/poetry-complete.toml
$ validate-pyproject tests/invalid-examples/poetry/poetry-bad-multiline.toml -t poetry=http://json.schemastore.org/poetry.json
Invalid file: tests/invalid-examples/poetry/poetry-bad-multiline.toml
[ERROR] `tool.poetry.description` must match pattern ^[^
]*$

Nested JSON docs can be loaded, too, like t poetry=http://json.schemastore.org/cibuildwheel.json#properties/tool/properties/cibuildwheel.

Stage 2 will be to add a way to load all sections, something like --tools http//json.schemastore.org/pyproject.json#properties/tool. This will read the file, get the properties/tool section, and load all the sections present, including resolving $refs, that are not already loaded as plugins. (Also testing in draft)

We could provide a --schemastore shortcut for the above with the canonical URL.

abravalheri commented 11 months ago

Seems like SchemaStore's validator is not the only one to not accept unknown formats:

$ validate-pyproject tests/invalid-examples/poetry/poetry-bad-multiline.toml --tools=http://json.schemastore.org/pyproject.json#properties/tool -vv
[INFO] validate_pyproject.api.load_builtin_plugin defines `tool.distutils` schema
[INFO] validate_pyproject.api.load_builtin_plugin defines `tool.setuptools` schema
[INFO] https://json.schemastore.org/cibuildwheel.json defines `tool.cibuildwheel` schema
[INFO] https://json.schemastore.org/ruff.json defines `tool.ruff` schema
[INFO] https://json.schemastore.org/scikit-build.json defines `tool.scikit-build` schema
[INFO] https://json.schemastore.org/poetry.json defines `tool.poetry` schema
[ERROR] JsonSchemaDefinitionException: Unknown format: uint8
...
fastjsonschema.exceptions.JsonSchemaDefinitionException: Unknown format: uint8

😅

PS: I did find the place in SchemaStore where I can ignore unknown formats.

Ouch... I think we can work with that to be compliant with the spec. Probably by modifying FORMAT_FUNCTIONS to use a defaultdict we can achieve a "lenient" by default behaviour.

I would also be happy to add int/uintX functions to the formats.py module. Do you think this approach is worthy?

abravalheri commented 11 months ago

Okay, this is my current proposal, based on #110 (comment):

Stage 1: we add a way to load external files, either via path or URL. I've made a draft implementation: ... Stage 2 will be to add a way to load all sections, something like --tools http//json.schemastore.org/pyproject.json#properties/tool. This will read the file, get the properties/tool section, and load all the sections present, including resolving $refs, that are not already loaded as plugins. We could provide a --schemastore shortcut for the above with the canonical URL.

I like this approach. Thank you very much @henryiii for taking the time to look into it.