Stranger6667 / jsonschema-rs

JSON Schema validation library
https://docs.rs/jsonschema
MIT License
511 stars 91 forks source link

Streaming to validate very large JSON files #229

Open DavidFarago opened 3 years ago

DavidFarago commented 3 years ago

Is there (or will there be) an option to validate a very large JSON file (up to 5GB) in chunks, e.g. via streaming, so that the whole JSON file never has to be held in memory?

This would be awesome, since I haven't found any other JSON schema validator in Python being able to do this. For other languages, there is e.g. https://github.com/worldturner/medeia-validator.

Stranger6667 commented 3 years ago

In principle, I think it is possible, as serde supports deserialization without buffering + the file could be accessed through mmap. Though I am not sure how much effort it will require, but I'd be happy to have support for this feature here :)

DavidFarago commented 3 years ago

Very cool, and thanks for the fast reply.

Can you give a first guess about when it might be available? I would love to use jsonschema-rs for our microservice, but since we need the service for very large JSON files within 2 weeks, I wonder if I have to port it to a JVM language to be able to use https://github.com/worldturner/medeia-validator...

Stranger6667 commented 3 years ago

My guess will be in a few months - at the moment, I don't have the bandwidth to work on this, unfortunately :(