chris48s / v8r

✔️ A command-line JSON and YAML validator that's on your wavelength
https://www.npmjs.com/package/v8r
MIT License
28 stars 6 forks source link

Perform network I/O in parallel when validating multiple files #455

Open chris48s opened 1 month ago

chris48s commented 1 month ago

If I ask v8r to validate multiple files (e.g: v8r *.json), it will work through each of the files in sequence, fetching the schema and then validating the file. Cacheing the catalog and schemas speeds things up a bit - particularly if we are validating lots of files against the same schema. However, there is scope to make this a lot faster by doing things in parallel. The process of fetching and resolving schema references in particular is I/O-bound and lends itself to being done in parallel to speed things up.

Probably the ideal workflow here is something like:

Some possible problems:

This is quite a big/fiddly project, but this could make v8r a lot faster in some situations.

chris48s commented 1 month ago

I think race conditions on cache writes don't really matter that much. The most naive approach would be to just do nothing.

In that case, we might make 2 requests for the same thing at roughly the same time. In the most common case, we get the same result both times and write it twice. The second write sets a slightly later timestamp.

I guess the worst case scenario is that this happens just as the upstream resource changes. In that case, we get 2 different responses and different bits of the validation within the same run are using 2 slightly different versions of the same schema. In principle, you could probably reproduce that with the current setup using ttl=0 or a really short ttl though.

I think I'm happy enough with that edge case not to implement any kind of special locking or anything for the cache though.