Open chris48s opened 1 month ago
I think race conditions on cache writes don't really matter that much. The most naive approach would be to just do nothing.
In that case, we might make 2 requests for the same thing at roughly the same time. In the most common case, we get the same result both times and write it twice. The second write sets a slightly later timestamp.
I guess the worst case scenario is that this happens just as the upstream resource changes. In that case, we get 2 different responses and different bits of the validation within the same run are using 2 slightly different versions of the same schema. In principle, you could probably reproduce that with the current setup using ttl=0
or a really short ttl
though.
I think I'm happy enough with that edge case not to implement any kind of special locking or anything for the cache though.
If I ask v8r to validate multiple files (e.g:
v8r *.json
), it will work through each of the files in sequence, fetching the schema and then validating the file. Cacheing the catalog and schemas speeds things up a bit - particularly if we are validating lots of files against the same schema. However, there is scope to make this a lot faster by doing things in parallel. The process of fetching and resolving schema references in particular is I/O-bound and lends itself to being done in parallel to speed things up.Probably the ideal workflow here is something like:
Some possible problems:
This is quite a big/fiddly project, but this could make v8r a lot faster in some situations.