Marcono1234 / struson

Streaming JSON reader and writer written in Rust
https://crates.io/crates/struson
Apache License 2.0
59 stars 6 forks source link

[simple-api] Support automated repeated `seek_to` for all array or object elements #48

Closed Marcono1234 closed 6 months ago

Marcono1234 commented 6 months ago

Problem solved by the enhancement

The current seek_to method only supports seeking to a single value; it does not support traversing all elements of an intermediate array or object.

For example, consider this JSON path (where [*] represents all array items, and .* represents the values of all members, regardless of name):

first.second[*].third.*.fifth

Currently you could seek to first.second but afterwards you would have to traverse the remainder manually.

Enhancement description

There should be a read_all_seeked(path: MultiJsonPath, f: FnMut(ValueReader)) (pseudo code) method.

Possibly there could also be a method read_seeked(path: MultiJsonPath, f: FnOnce(SeekedReader)), where SeekedReader can repeatedly read and has a has_next method; this would be similar to the existing ValueReader::read_array. But not sure if there is really a common use case for this.

(the method and type names here are only initial ideas)

Alternatives / workarounds

Instead of having a separate type MultiJsonPath, could extend the existing JsonPath (respectively JsonPathPiece). However, that might be confusing since the use cases are different and the regular seek_to method would not actually support the all array items / all member values path pieces (and would have to return an error or panic for them).

Maybe it would also be possible to provide this for the "Advanced API", but there the implementation might not be as clean as it would be for the "Simple API" where there are separate structs and they implicitly make sure a value is fully read.

[^1]: If desired could even extend this with path piece variants where it is possible to specify a Fn(u32) -> bool respectively Fn(&str) -> bool predicate. They probably shouldn't be FnMut though because that would allow misusing them for something other than a predicate, where separate seeking calls might be cleaner a solution.

Marcono1234 commented 6 months ago

Might be good if the read_all_seeked method had a require_match: bool parameter or similar to enforce that at least one value was matched, respectively f was called at least once. For example with require_match=true for the path .*[*] this will fail for the JSON data {} and {"a": []}, but succeed for {"a": [1]}, but also for {"a": [], "b": [1]}.

Marcono1234 commented 6 months ago

@hjylxmhzq, this feature is probably useful for you; it was actually inspired by your Struson usage in your gemini-client-rs project [^1]. With this feature you could probably write it like this:

use struson::reader::simple::*;
use struson::reader::simple::multi_json_path::multi_json_path;

let path = multi_json_path![
    [*],
    ?"candidates",
    [*],
    ?"content",
    ?"parts",
    [*],
    ?"text",
];
let success = json_reader.read_seeked_multi(&path, false, |value_reader| {
    let v = value_reader.read_string()?;
    tx.blocking_send(StreamItem::Data(v))?;
    has_content = true;
    Ok(())
});

...

The [*] here means all array items, allowing an empty array. If you expect the array to not be empty you can use [+] instead. The '?name' means an optional member. If you expect the member to be present you can omit the ?.

This feature is available since the latest Struson release, version 0.5.0.

I hope this feature is useful for you. Any feedback is appreciated!

[^1]: I was just a bit curious how Struson is used currently, and noticed your project.

hjylxmhzq commented 5 months ago

Fantastic feature, this really helps! I really appreciate the work you've done. 👏