Stranger6667 / jsonschema-rs

JSON Schema validation library
https://docs.rs/jsonschema
MIT License
511 stars 91 forks source link

Interest in a "lite" feature? #470

Open enkore opened 6 months ago

enkore commented 6 months ago

This is a very good library with solid coverage of the JSON schema spec. However, both JSON and JSON schema especially have a few corner cases that are particularly difficult to handle. JSON itself has the whole "what's a number, really?" thing, and JSON schema has a few interesting corners in the spec like:

These add considerable amounts of code to jsonschema; fancy regex and the "high-end" regex crate seem to add up to about 1 MB of code these days. Just the special case of multipleOf with a float multiple adds a bignum math library.

I suspect that many end-users don't use these corners of the spec (I know I don't) and could benefit from a "lite" feature set:

I did some haphazard exploration of adding this and it doesn't seem to be particularly annoying. The biggest change would be introducing a module (or just a bunch of #[cfg] use in lib.rs) to select between the different regex engines for the different use-cases, since fancy-regex is only used for matching schema-supplied RE, while regex is used for internal REs. lite would switch both of these to regex-lite. The remainder is pretty much just a few cfg attributes to disable some validators. I haven't looked at tests though, yet.

Stranger6667 commented 6 months ago

Hi!

I am generally in favor of enabling users to have fewer dependencies, especially ones that they might not always need. However, as far as I know, the common practice is designing features to be additive, how would it work with the possible lite feature? Does it mean that there will be something similar to TLS-related feature switches?

Btw, from the absolute numbers point of view, what are the exact improvements let's say on x64 linux / glibc?

fewer dependencies means less code to keep an eye on, quicker compile times and smaller artifacts

enkore commented 6 months ago

How this would work on the Cargo.toml level - I'm not really sure. Some crates do use mutually exclusive features for selecting different backends, e.g. https://github.com/rust-lang/flate2-rs#backends – this always seems to come with caveats, either errors if there are conflicting feature requirements in the dependency tree or ignoring some of them. I think it should be possible to write it in a way that specifying both full/standard features and "lite" would result in the former instead of an error. Probably something like this:

[features]
default = ["resolve-http", "resolve-file", "cli", "full"]
full = ["dep:regex", "dep:fancy_regex", "..."]
lite = ["dep:regex_lite"]

cli = ["clap"]
draft201909 = []
draft202012 = []

resolve-http = ["reqwest"]
resolve-file = []

And then resolving e.g. regex to regex_lite when the lite feature is specified without the full feature (#[cfg(all(feature = "lite", not(feature = "full"))]).

As for specific numbers, here is an example in a CLI project with a fairly typical dependency tree: There’s clap, serde and a few other crates in there. The incremental build time is for changing a file in the CLI project, not in jsonschema-rs.

w/o jsonschema lite 0.17.1
Crates 101 119 159
Initial debug build 42s 55s 75s (+36%)
Incremental debug build 2.8s 3.5s 4.2s (+20%)
Debug binary size 50M 71M 103M (+45%)
Release build 26s 32s 42s (+31%)
Release binary size 2M 3.3M 4.9M (+48%)

Interestingly on AArch64 binary size increases more: from 2.7M to 4.4M (+62% - the absolute increase is similar).

Stranger6667 commented 5 months ago

Awesome! Thank you for such a detailed report :) it would definitely be useful to have!