Closed CPunisher closed 2 months ago
thanks a lot. we could support get_by_schema
here, and the get_many
API is hard to use
thanks a lot. we could support
get_by_schema
here, and theget_many
API is hard to use
Is get_by_schema
a new API to replace get_many
?
yeah, like this https://github.com/bytedance/sonic-cpp/pull/85
Would you be interested?
Would you be interested?
I see. I will investigate and have a try.
Would you be interested?
Hello, after investigating, running and debugging schema parsing API on sonic_cpp, I'm very confused about the specification of it (what does it do and how does it replace get_many
).
Could you please provide any extra information?
as follows: This RFC is not yet mature, we can discuss some corner cases more and update it.
compare with get_many
(now):
pros: easy to write, and the schema can convert from any Serde type, for example, we can use to_value
to convert a map
into a schema.
cons: cannot represent the index of an array
use sonic_rs::json;
fn get_by_schema(data: &str, mut schema: sonic_rs::Value) -> sonic_rs::Value {
todo!()
}
fn main() {
// use a dynamic value to reperesent the JSON schem, now we use `sonic_rs::Value`
let mut schema = json!({
"a": null, // default value is `null`
"b": {
"b1": {},
"b2": "default string" // default value is string
},
"c": [], // default value is []
});
let data = r#"
{
"a": {},
"b": {
"b1": 123,
},
"c": [1, 2, 3],
"d": "balabala..."
}
"#;
// parse json data by schem, we can parse into the schema value inplace
let got: sonic_rs::Value = get_by_schema(data, schema);
assert_eq!(
got,
json!({
"a": {},
"b": {
"b1": 123,
"b2": "default string"
},
"c": [1, 2, 3]
})
);
}
as follows: This RFC is not yet mature, we can discuss some corner cases more and update it.
compare with
get_many
(now): pros: easy to write, and the schema can convert from any Serde type, for example, we can useto_value
to convert amap
into a schema. cons: cannot represent the index of an arrayuse sonic_rs::json; fn get_by_schema(data: &str, mut schema: sonic_rs::Value) -> sonic_rs::Value { todo!() } fn main() { // use a dynamic value to reperesent the JSON schem, now we use `sonic_rs::Value` let mut schema = json!({ "a": null, // default value is `null` "b": { "b1": {}, "b2": "default string" // default value is string }, "c": [], // default value is [] }); let data = r#" { "a": {}, "b": { "b1": 123, }, "c": [1, 2, 3], "d": "balabala..." } "#; // parse json data by schem, we can parse into the schema value inplace let got: sonic_rs::Value = get_by_schema(data, schema); assert_eq!( got, json!({ "a": {}, "b": { "b1": 123, "b2": "default string" }, "c": [1, 2, 3] }) ); }
According to your example, I can know:
["b", "b1"]
exists in the schema, but does not exist in the data, then the data will be filled with the corresponding default value in the schema.Also, I find the cpp version respects orders of fields. This could be one of my most confused points. For example:
std::string json_schema = R"(
{"obj":1}
)";
std::string json = R"(
{"it":1, "obj":{"a":{"b":1}, "b":[1]}}
)";
// output: {"obj":{"a":{"b":1},"b":[1]}}
std::string json_schema = R"(
{"obj":1}
)";
std::string json = R"(
{ "obj":{"a":{"b":1}, "it":1,"b":[1]}}
)";
// output: {"obj":{"a":{"b":1},"it":1,"b":[1]}}
std::string json_schema = R"(
{"it":1, "c":[1], "obj":{"a":{"b":1}}}
)";
std::string json = R"(
{"it":1, "b":[1], "obj":{"a":{"b":1}}}
)";
// output: {"it":1,"c":[1],"obj":{"a":{"b":1}}}
Are these all expected behaviors?
It seems that schemas are designed to make some transformation to the data. But the document shows that get_many
is used for lazily getting many nested values by keys. How can schema do that?
The behavior is right.
The get_many
must construct a PointerTree
through multiple PointerPath
. The tree can be also built into a schema, and we also can get all the targeted fields from a json by the schema.
If using get_by_schema
, the user can also get the target fields through the key index, such as root["a"]["b"]
, which will not need to care about the index of PointerPath
.
Thus, get_by_schema
is easier to get_many
. and the hard case is how to deal with the array index.
I try to make
get_many
returnVec<Result<LazyValue>>
instead ofResult<Vector<LazyValue>>
. I think the purpose is to generate separate errors for each query path rather than one overall error.Now everything seems ok, but there are still some problems and I'm hoping for some suggestions:
It's hard to return
Vec<Result<LazyValue>>
directly. As my code shows, a solution is replacing every element inout
with the same error.get_many
has to validate utf8 input. This is a validation for the whole input.self.get_many_rec
call inget_many
. https://github.com/cloudwego/sonic-rs/blob/99b37cd1ff4dfdf5b9ee8b45c91f1403f7c0257a/src/parser.rs#L2119I wrap
Result<LazyValue<'_>>
inRc
: Motivation: We need to clone the LazyValue for all possible orders. However,Result
is not clonable sinceError
is not clonable. ImplementClone
forError
is either impossible, because the innerstd::io::Error
is not clonable. https://github.com/cloudwego/sonic-rs/blob/99b37cd1ff4dfdf5b9ee8b45c91f1403f7c0257a/src/parser.rs#L1859 Pros: Performance is optimized by avoidinglv.clone()
Cons: It's hard to use the result. As the example shows, we have to writemany[0].as_ref().as_ref().unwrap().as_raw_str()
compared to the originalmany[0].as_raw_str()
Any suggestion?