Currently, JSON parsing is done with e.g. serde_json::de::from_reader(File::open(path)?) to a serde_json::Value. This is OK for small and medium-sized JSON documents, but does way too many allocations to be efficient when the JSON gets above a few hundred MBs (not a fault of serde, just the reality of parsing unknown JSON to memory). I see at least two alternatives here:
Rewrite the inference code to work with streaming JSON data, rather than parsing the entire structure into memory at once. (Proof of concept written. Does the trick. Just a big refactoring to do in the actual crate.)
Replace the use of serde_json::Value with a mostly compatible type that uses a string pool, as repeated object keys are a very large part of the memory used.
Currently, JSON parsing is done with e.g.
serde_json::de::from_reader(File::open(path)?)
to aserde_json::Value
. This is OK for small and medium-sized JSON documents, but does way too many allocations to be efficient when the JSON gets above a few hundred MBs (not a fault of serde, just the reality of parsing unknown JSON to memory). I see at least two alternatives here:serde_json::Value
with a mostly compatible type that uses a string pool, as repeated object keys are a very large part of the memory used.