boostorg / json

A C++11 library for parsing and serializing JSON to and from a DOM container in memory.
https://boost.org/libs/json
Boost Software License 1.0
428 stars 93 forks source link

Inserting values at pointers #700

Closed doganulus closed 1 year ago

doganulus commented 2 years ago

Currently trying out the new JSON pointer implementation and I think this would be very useful in many cases.

One thing I have needed is inserting or assigning arbitrary JSON values at locations specified by JSON pointers. Is it currently possible? How can I achieve it in the current state?

sehe commented 2 years ago

Probably a duplicate of https://github.com/boostorg/json/issues/480

pdimov commented 2 years ago

Not a duplicate. That issue is about finding an element by its path, and this one is about inserting an element at a specified path.

grisumbras commented 2 years ago

Just to be sure, this is about creating intermediary objects, right? Do you also have any suggestions on how to deal with non-existant elements of arrays?

doganulus commented 2 years ago

Yes, non-existent objects are to be created at the insertion. For example,

value jv1 {{"a": {{"b", 1}}}};
value jv2 {{"c": {{"d", 2}}}};

jv1.insert_or_assign("/x", jv2);
jv1.insert_or_assign("/a/b", jv2);

assert(jv1.at_pointer"/x/c/d" == 2)     // new value inserted at previously non-existent path /x
assert(jv1.at_pointer("/a/b/c/d") == 2) // new value assigned at the existing path /a/b

For non-existent array elements, I think std::vector assignment behavior might be preferable. Assign the value if the index is in range or else return an out-of-range/not-found error. Inserting to the back of an array can be specified using the - character according to RFC 6901. A helper function to get the size of a json::array at a given pointer path may be needed as well. Getting the size of objects/arrays via pointers is currently possible.

doganulus commented 2 years ago

Further investigations with the current API:

For objects:

value jv = {{"abc", {{"xyz", {12, 13, 15}}}}};

jv.at_pointer("/abc/xyz") = {"123", "1231"}; // works as desired
jv.at_pointer("/abc/klm") = {"123", "1231"}; // does not work but desired

For arrays:

value jv = {{"abc", {{"xyz", {12, 13, 15}}}}};

jv.at_pointer("/abc/xyz/0") = {"123", "1231"}; // works as desired
jv.at_pointer("/abc/xyz/3") = {"123", "1231"}; // does not work as desired
jv.at_pointer("/abc/xyz/-") = {"123", "1231"}; // does not work but desired
pdimov commented 2 years ago

We're having differences of opinion on what jv.insert_at_pointer("/a/2", 5) needs to create, an array or an object. @grisumbras thinks it should create an object and set its "2" key to 5. I find this consistent but kind of not really doing what one would want in the majority of cases. It seems to me that it should create an array and set its third element to 5 (leaving the first two null).

grisumbras commented 2 years ago

There are 3 questions we need to answer to decide on the behaviour.

  1. Whether to replace nulls when creating intermediaries.
  2. When creating intermediaries whether to treat a number segment of the path as an implication of array index or object key.
  3. Whether to extend arrays with null elements when a referenced index does not exist.

Positive answer to question 3 also implies positive answer to question 1, as otherwise there's very little utility in it. Personally, I am leaning towards 1 - yes, 2 - no, 3 - no. The only way to create an intermediary array would be using - segement. That would also be the only way to extend an array.

First of all, I very much don't like option 3. I'm also not sure if it's actually needed, because - segment can be used. Without option 3 there will be no way reason to create arrays using number segments, just use -. So option 2 becomes irrelevant.

Meanwhile, option 1 is very useful IMO, doesn't create inconsistencies and probably even simplifies implementation.

Examples with my approach:

value jv = parse(R"( {"a": null, "x": [2]}  )");
jv.assign_at_pointer("/a/b/1", 3); // {"a": {"b": {"1", 3}}, "x": [2]}
jv.assign_at_pointer("/x/-", 4);   // {"a": {"b": {"1", 3}}, "x": [2, 4]}
jv.assign_at_pointer("/y/-", 5);   // {"a": {"b": {"1", 3}}, "x": [2, 4], "y": [5]}
jv.assign_at_pointer("/z/0", 6);   // {"a": {"b": {"1", 3}}, "x": [2, 4], "y": [5], "z": {"0": 6}}
jv.assign_at_pointer("/x/1", 7);   // {"a": {"b": {"1", 3}}, "x": [2, 7], "y": [5], "z": {"0": 6}}
jv.assign_at_pointer("/x/3", 8);   // error: we don't extend arrays by more than one element

So, to help me understand if there's a case for supporting option 3, can anyone present a realistic use case?

pdimov commented 2 years ago

There's no use case for (1) if you don't have (2)+(3).

pdimov commented 2 years ago

(1) is useful if one has (3) and wants to maintain order independence as in

value jv = array();
jv.assign_at_pointer("/1/y", 5);
jv.assign_at_pointer("/0/x", 3);

giving the same result as

jv.assign_at_pointer("/0/x", 3);
jv.assign_at_pointer("/1/y", 5);

because there's an implicit null created by the first line.

On second thought, though, I no longer favor (3) because it makes denial of service attacks trivial ("/123456789"). So I agree that numeric indices should be constrained to [0, size()] (with - being the same as size().)

doganulus commented 2 years ago

(1) yes, (3) no.

Still, there might be some rough edges regarding (1) though as it is not only limited to null values but all values that are not objects.

Also I think the minus sign shouldn't create a new array but treat it as a regular key if the stem is not an array. For example,

value jv = parse(R"( {"x": [2]} )");
jv.assign_at_pointer("/x/-", 5);   // {"x": [2, 5]}
jv.assign_at_pointer("/y/-", 5);   // {"x": [2], "y": {"-" : 5}}

The previous behavior can be obtained via assigning an array directly to the stem:

value jv = parse(R"( {"x": [2]} )");
jv.assign_at_pointer("/y", {5});   // {"x": [2], "y": [5]}
jv.assign_at_pointer("/y/-", 6);   // {"x": [2], "y": [5,6]}
doganulus commented 2 years ago

RapidJSON implements the behavior (3). The program below is problematic as @pdimov points.

#include "rapidjson/document.h"
#include "rapidjson/pointer.h"

using namespace rapidjson;

int main()
{
  const char* json = R"({"my_obj": { "my_arr": [10,11,12]}})";
  Document d;
  d.Parse(json);

  Document v;
  v.Parse("20");
  SetValueByPointer(d, "/my_obj/my_arr/999999999", v);

  return 0;
}
grisumbras commented 2 years ago

Interesting. So RapidJSON says yes to 1, 2, and 3. And even converts other kinds of values to arrays and objects, not just nulls. Maybe we need 2 kinds of functions: one conservative and one "helpful".

grisumbras commented 2 years ago

@doganulus @pdimov can you check out #713 to see if you agree with the choices made in that implementation. The choices are:

  1. Nulls are replaced when creating intermediate elements.
  2. When creating intermediate elements segments are always treated as object keys, (in other words, only objects are created as intermediates).
  3. Arrays are extended by 1 element at most.

In other words, the answers to questions discussed previously are 1: yes, 2: no, 3: no.

grisumbras commented 1 year ago

@doganulus can you check out #713 again. It has been rewritten to be customisable. Does the provided functionality accomodate your use cases?

doganulus commented 1 year ago

One question: Can we set values at intermediate keys as follows?

jv = value();
result = &jv.set_at_pointer("/a/b/c", "m"); //
result = &jv.set_at_pointer("/a/b", "n");

Or the option replace_any_scalar just implies that we can only replace the leaf values?

grisumbras commented 1 year ago

No, that option controls whether only nulls or any non-container can be replaced.

grisumbras commented 1 year ago
value jv = 1;
jv.set_at_pointer("/a/b/c", "m"); // exception
jv.set_at_pointer("/a/b/c", "m", {.replace_any_scalar = true}); // success