Closed henryiii closed 7 months ago
Hi @henryiii , in principle, I think this feature is OK.
Something that I was already considering before is to have some form of loading a schema from a local path (opt-in), so this could be a variation of that, e.g.:
validate-pyproject --schema tool_name=./file.schema.json other-tool-name=https://... ...
# or even a custom uri schema-store://tool-name
Something to have in mind is that fastjsonschema
is compatible with draft 07, so we have to be careful with external sources that were not explicitly designed to interoperate with validate-pyproject
.
Do you have any thoughts on that?
SchemaStore is designed around draft 07. :)
I've been looking into this a bit more, and there's one fundamental issue we'd need to solve, but I think it's easy. I'll open a separate issue about that #111. (In an online conference for three days, so depends on how interesting/boring the talks are. :) )
Is this still true:
I'd like to also submit at least setuptools.schema.json to SchemaStore, but this seems like it needs updating. :)
Is this still true:
Oops. Fix in #112.
Is "format"
(https://json-schema.org/understanding-json-schema/reference/string#format) used in fastjsonschema? Or was it just part of some future work?
Hi @henryiii , fastjsonschema
allows using format
.
If I am not wrong, there are some default ones, but I wrote a bunch of custom ones in the formats
sub-module.
Ahh! Not sure why I couldn’t find them via grep.
Maybe that is relevant in this context:
Side note:
BTW, thank you very much @henryiii for all the interest you have been showing in this project and all the contributions. I really appreciate that.
Should I go ahead and create a new release with the PRs that were already merged, or do you have any pending change that you have in mind?
Probably fine to make a release. I'm going to work on making a contribution to SchemaStore, and that requires processing setuptools.schema.json quite a bit to make it standards compliant - no format
, no $$description
, etc. How often do you expect it to change? I'm building a little conversion script, but if changes are pretty rare, we could just make them twice if we need to in the future. In either case, though, it's unlikely to change the existing file unless I find any bugs with it.
I'd also like to try to work on this issue itself (validate-pyproject pulling schemas from SchemaStore), but that's fine for a future release.
it standards compliant - no
format
...
Just one minor comment: format
is standard, but what to do with it is pretty much up to the tool. The "standard" (or docs website?) says:
By default, format is just an annotation and does not effect validation.
Optionally, validator implementations can provide a configuration option to enable format to function as an assertion rather than just an annotation. ... A JSON Schema validator will ignore any format type that it does not understand.
So, it should possible to keep it there for the sake of "semantics" (with the understanding that this way they will be more "informational" than "enforcement").
How often do you expect it to change? I'm building a little conversion script, but if changes are pretty rare, we could just make them twice if we need to in the future. In either case, though, it's unlikely to change the existing file unless I find any bugs with it.
It is been a while that it does not change... So I don't think it should change too much.
But it will follow the changes in setuptools
(there are some stuff like ext_modules
that are in the back of my mind).
One option is to add this script to this repository and run it as part of the test suite to ensure compatibility[^1].
In the future, I was considering to have the schema files themselves directly on the setuptools
repository to facilitate contributors (other than me) to introduce new configurations, but this is something open for debate.
[^1]: I am happy to add you as a maintainer to this repo if you are interested, BTW. You have always given great contributions.
I think the problem is SchemaStore doesn't allow custom formats[^1], since it's used by about a dozen different editors, etc., and they aren't familiar with non-standard formats. Also things like https://github.com/abravalheri/validate-pyproject/blob/585a758ea57955894dacb65bb8071fe46158992d/src/validate_pyproject/plugins/setuptools.schema.json#L246-L247 kind of assume the validator knows how to use the format.
I think it's fine, we can just replace format with a ref to a regex in the conversion script.
Sure, I'm fine to be added, as long as you don't expect me to do too much - happy to help a bit. The main benefit, IMO, is I can fix PRs like #109, and I get to put it in the "My Projects" listing https://iscinumpy.gitlab.io. ;)
[^1]: Those are also missing "type": "string"
. There's a chance adding that would make SchemaStore happy.
Sure, I'm fine to be added, as long as you don't expect me to do too much - happy to help a bit.
No expectations 😝
Those are also missing "type": "string". There's a chance adding that would make SchemaStore happy.
I don't remember now, but I remember seeing in the reference docs at that point in time type
being treated as string
by default... but my memory is fuzzy. Anyway that was my mistake 😅
I think the problem is SchemaStore doesn't allow custom formats
That looks like a bug on SchemaStore... The "spec/website" explicitly say "By default, format is just an annotation and does not effect validation ... A JSON Schema validator will ignore any format type that it does not understand".
But anyway, if problematic, processing it out is a good option.
The reason why I decided to go with format
in some cases rather than regex
was also because some validations where tricky to express as regex (e.g. the PEP 508 is usually expressed in terms of a BNF grammar right?), or too long to be humanly understandable in a JSON format.
In other cases I could not find a good regex definition (like Python UTF-8 identifiers), or it would be better to use a validation function, so if a PEP updates some definition, we are not surprised with changes.
So in the end, I settled with format
because it gave me just the right amount of "editing"/"comprehending error message" capabilities.
Quick question: is this valid?
[tool.setuptools.packages.find]
where = "src"
This is in one of the samples already in SchemaStore and doesn't pass validation (which would require
[tool.setuptools.packages.find]
where = ["src"]
)
Nope:
$ cookiecutter gh:scientific-python/cookie
...
[8/9] Choose a build backend
1 - Hatchling - Pure Python (recommended)
2 - Flit-core - Pure Python (minimal)
3 - PDM-backend - Pure Python
4 - Whey - Pure Python
5 - Poetry - Pure Python
6 - Setuptools with pyproject.toml - Pure Python
7 - Setuptools with setup.py - Pure Python
8 - Setuptools and pybind11 - Compiled C++
9 - Scikit-build-core - Compiled C++ (recommended)
10 - Meson-python - Compiled C++ (also good)
11 - Maturin - Compiled Rust (recommended)
Choose from [1/2/3/4/5/6/7/8/9/10/11] (1): 6
...
$ cd package
$ vi pyproject.toml
$ pipx run build
...
ValueError: invalid pyproject.toml config: `tool.setuptools.packages`.
configuration error: `tool.setuptools.packages` must be valid exactly by one definition (0 matches found):
- type: array
items:
type: string
at least one of the following:
- {format: 'python-module-name'}
- {format: 'pep561-stub-name'}
- type: table
additional keys: False
keys:
'find':
type: table
additional keys: False
keys:
'where':
type: array
items: {type: string}
'exclude':
type: array
items: {type: string}
'include':
type: array
items: {type: string}
'namespaces': {type: boolean}
ERROR Backend subprocess exited when trying to invoke get_requires_for_build_sdist
So the example in SchemaStore is wrong. :)
Setuptools expects to find an array there. I remember it being a suggestion from the discord server.
FYI, my PR is here: https://github.com/SchemaStore/schemastore/pull/3307
Seems like SchemaStore's validator is not the only one to not accept unknown format
s:
$ validate-pyproject tests/invalid-examples/poetry/poetry-bad-multiline.toml --tools=http://json.schemastore.org/pyproject.json#properties/tool -vv
[INFO] validate_pyproject.api.load_builtin_plugin defines `tool.distutils` schema
[INFO] validate_pyproject.api.load_builtin_plugin defines `tool.setuptools` schema
[INFO] https://json.schemastore.org/cibuildwheel.json defines `tool.cibuildwheel` schema
[INFO] https://json.schemastore.org/ruff.json defines `tool.ruff` schema
[INFO] https://json.schemastore.org/scikit-build.json defines `tool.scikit-build` schema
[INFO] https://json.schemastore.org/poetry.json defines `tool.poetry` schema
[ERROR] JsonSchemaDefinitionException: Unknown format: uint8
...
fastjsonschema.exceptions.JsonSchemaDefinitionException: Unknown format: uint8
😅
PS: I did find the place in SchemaStore where I can ignore unknown formats.
Okay, this is my current proposal, based on https://github.com/abravalheri/validate-pyproject/issues/110#issuecomment-1753201256:
Stage 1: we add a way to load external files, either via path or URL. I've made a draft implementation:
$ validate-pyproject tests/examples/poetry/poetry-complete.toml -t poetry=http://json.schemastore.org/poetry.json -vv
[INFO] validate_pyproject.api.load_builtin_plugin defines `tool.distutils` schema
[INFO] validate_pyproject.api.load_builtin_plugin defines `tool.setuptools` schema
[INFO] https://json.schemastore.org/poetry.json defines `tool.poetry` schema
Valid file: tests/examples/poetry/poetry-complete.toml
$ validate-pyproject tests/invalid-examples/poetry/poetry-bad-multiline.toml -t poetry=http://json.schemastore.org/poetry.json
Invalid file: tests/invalid-examples/poetry/poetry-bad-multiline.toml
[ERROR] `tool.poetry.description` must match pattern ^[^
]*$
Nested JSON docs can be loaded, too, like t poetry=http://json.schemastore.org/cibuildwheel.json#properties/tool/properties/cibuildwheel
.
Stage 2 will be to add a way to load all sections, something like --tools http//json.schemastore.org/pyproject.json#properties/tool
. This will read the file, get the properties/tool
section, and load all the sections present, including resolving $refs
, that are not already loaded as plugins. (Also testing in draft)
We could provide a --schemastore
shortcut for the above with the canonical URL.
Seems like SchemaStore's validator is not the only one to not accept unknown
format
s:$ validate-pyproject tests/invalid-examples/poetry/poetry-bad-multiline.toml --tools=http://json.schemastore.org/pyproject.json#properties/tool -vv [INFO] validate_pyproject.api.load_builtin_plugin defines `tool.distutils` schema [INFO] validate_pyproject.api.load_builtin_plugin defines `tool.setuptools` schema [INFO] https://json.schemastore.org/cibuildwheel.json defines `tool.cibuildwheel` schema [INFO] https://json.schemastore.org/ruff.json defines `tool.ruff` schema [INFO] https://json.schemastore.org/scikit-build.json defines `tool.scikit-build` schema [INFO] https://json.schemastore.org/poetry.json defines `tool.poetry` schema [ERROR] JsonSchemaDefinitionException: Unknown format: uint8 ... fastjsonschema.exceptions.JsonSchemaDefinitionException: Unknown format: uint8
😅
PS: I did find the place in SchemaStore where I can ignore unknown formats.
Ouch... I think we can work with that to be compliant with the spec.
Probably by modifying FORMAT_FUNCTIONS to use a defaultdict
we can achieve a "lenient" by default behaviour.
I would also be happy to add int/uintX
functions to the formats.py
module.
Do you think this approach is worthy?
Okay, this is my current proposal, based on #110 (comment):
Stage 1: we add a way to load external files, either via path or URL. I've made a draft implementation: ... Stage 2 will be to add a way to load all sections, something like --tools http//json.schemastore.org/pyproject.json#properties/tool. This will read the file, get the properties/tool section, and load all the sections present, including resolving $refs, that are not already loaded as plugins. We could provide a --schemastore shortcut for the above with the canonical URL.
I like this approach. Thank you very much @henryiii for taking the time to look into it.
Would it make sense to support SchemaStore? Currently, there's only one subschema (Ruff), though I'm asking to see if Poetry can be pulled out as a subschema (currently part of the core schema for pyproject.toml) and there's a proposed addition for Hatch. I'm hoping to add my schemas there too, for VSCode and such support. It would be nice to have in validate-pyproject too, I think - you wouldn't have to install the various projects, just pull from one place (schemastore). I don't think it can be implemented as a third party plugin very easily, since you don't know what tool sections are provided beforehand (unless it was a regularly updated copy or something).
I don't have a good idea of how it would work, how someone would opt-in to it (I guess it might require getting the schemas from a URL? I'm not that sure how schemastore works or what would be ideal. I know there's also a repo.) Maybe a regular updated copy would not be too bad. Thoughts?