PSeitz / serde_json_borrow

Fast JSON deserialization on borrowed data
MIT License
70 stars 11 forks source link

Crates.io Docs

Serde JSON Borrow

Up to 2x faster JSON parsing for ndjson type use cases.

serde_json_borrow deserializes JSON from &'ctx str into serde_json_borrow::Value<'ctx> DOM, by trying to reference the original bytes, instead of copying them into Strings.

In contrast the default serde_json parses into an owned serde_json::Value. Every String encountered is getting copied and therefore allocated. That's great for ergnomonics, but not great for performance. Especially in cases where the DOM representation is just an intermediate struct.

To get a little bit more performance, serde_json_borrow pushes the (key,values) for JSON objects into a Vec instead of using a BTreeMap. Access works via an iterator, which has the same API when iterating the BTreeMap.

OwnedValue

You can take advantage of OwnedValue to parse a String containing unparsed JSON into a Value without having to worry about lifetimes, as OwnedValue will take ownership of the String and reference slices of it, rather than making copies.

Limitations

The feature flag cowkeys uses Cow<str> instead of &str as keys in objects. This enables support for escaped data in keys. Without the cowkeys feature flag &str is used, which does not allow any JSON escaping characters in keys.

List of unsupported characters (https://www.json.org/json-en.html) in keys without cowkeys feature flag.

\" represents the quotation mark character (U+0022).
\\ represents the reverse solidus character (U+005C).
\/ represents the solidus character (U+002F).
\b represents the backspace character (U+0008).
\f represents the form feed character (U+000C).
\n represents the line feed character (U+000A).
\r represents the carriage return character (U+000D).
\t represents the character tabulation character (U+0009).

Benchmark

cargo bench


# TODO 
Instead of parsing a JSON object into a `Vec`, a `BTreeMap` could be enabled via a feature flag.

# Mutability
`OwnedValue` is immutable by design.
If you need to mutate the `Value` you can convert it to `serde_json::Value`.

## Example
Here is an example why mutability won't work:

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=bb0b919acc8930e71bdefdfc6a6d5240
```rust
use std::io;

use std::borrow::Cow;

/// Parses a `String` into `Value`, by taking ownership of `String` and reference slices from it in
/// contrast to copying the contents.
///
/// This is done to mitigate lifetime issues.
pub struct OwnedValue {
    /// Keep owned data, to be able to safely reference it from Value<'static>
    _data: String,
    value: Vec<Cow<'static, str>>,
}

impl OwnedValue {
    /// Takes ownership of a `String` and parses it into a DOM.
    pub fn parse_from(data: String) -> io::Result<Self> {
        let value = vec![Cow::from(data.as_str())];
        let value = unsafe { extend_lifetime(value) };
        Ok(Self { _data: data, value })
    }

    /// Returns the `Value` reference.
    pub fn get_value<'a>(&'a self) -> &'a Vec<Cow<'a, str>> {
        &self.value
    }
    /// This cast will break the borrow checker
    pub fn get_value_mut<'a>(&'a mut self) -> &'a mut Vec<Cow<'a, str>> {
        unsafe{std::mem::transmute::<&mut Vec<Cow<'static, str>>, &mut Vec<Cow<'a, str>>>(&mut self.value)}
    }
}

unsafe fn extend_lifetime<'b>(r: Vec<Cow<'b, str>>) -> Vec<Cow<'static, str>> {
    std::mem::transmute::<Vec<Cow<'b, str>>, Vec<Cow<'static, str>>>(r)
}

fn main() {
    let mut v1 = OwnedValue::parse_from(String::from("oop")).unwrap();
    let mut v2 = OwnedValue::parse_from(String::from("oop")).unwrap();
    let oop = v1.get_value().last().unwrap().clone();
    v2.get_value_mut().push(oop);
    drop(v1);
    let oop = v2.get_value_mut().pop().unwrap();
    println!("oop: '{oop}'");
}