danielaparker / jsoncons

A C++, header-only library for constructing JSON and JSON-like data formats, with JSON Pointer, JSON Patch, JSON Schema, JSONPath, JMESPath, CSV, MessagePack, CBOR, BSON, UBJSON
https://danielaparker.github.io/jsoncons
Other
699 stars 158 forks source link

Coredump in jsoncons::jsonpath::json_replace() #430

Closed wbangna closed 1 year ago

wbangna commented 1 year ago

The following code leads to a coredump:

  jsoncons::ojson doc = jsoncons::ojson::parse(R"({"value":{"value":"long______________enough"}})");

  jsoncons::jsonpath::json_replace(doc, "$..value",
      [](const jsoncons::ojson& match) {
        return jsoncons::ojson {"XXX"};
      });

It works when the json_replace() overload that takes a value instead of a function is used:

  jsoncons::ojson doc = jsoncons::ojson::parse(R"({"value":{"value":"long______________enough"}})");

  jsoncons::jsonpath::json_replace(doc, "$..value", jsoncons::ojson {"XXX"});

Interestingly the coredump disappears when the inner "value" is rewriten first:

  jsoncons::ojson doc = jsoncons::ojson::parse(R"({"value":{"value":"long______________enough"}})");

  jsoncons::jsonpath::json_replace(doc, "$..value..value",
      [](const jsoncons::ojson& match) {
        return jsoncons::ojson {"XXX"};
      });
  std::cout << doc << '\n';

  // now this works! 
  jsoncons::jsonpath::json_replace(doc, "$..value",
      [](const jsoncons::ojson& match) {
        return jsoncons::ojson {"XXX"};
      });

This bug was discovered in the release version 0.168.7 and reproduced on the current master

wbangna commented 1 year ago

Reading the code in jsoncons_ext/jsonpath/json_query.hpp and jsoncons_ext/jsonpath/expression.hpp I think that I found the problem.

In path_expression::evaluate() the matches of the query are basically kept in a vector of pointers to the nodes (path_value_receiver). When a node containing other nodes is replaced, those nodes are destructed. In the case where a node contains a node that also matches the JSONPath, replacing the first match will invalidate the pointers in the list of selected nodes.

I think the best solution would be to issue the rewrite function while searching for the JSONPath matches and to continue searching with the resulting JSON object. In the given example this fix would lead to only matching/rewriting the first "value" node. The rest of the JSON object would disappear after that rewrite and never be visited.

Another solution would be to use reference-counting for the node references in path_value_receiver. In this case the rewrite would occur for both "value" nodes but only the first node would be kept.

A dirty fix might be to just iterate through the selected nodes backwards, but this behaviour seems to contradict the option result_options::sort.

wbangna commented 1 year ago

I think the best solution would be to issue the rewrite function while searching for the JSONPath matches and to continue searching with the resulting JSON object. In the given example this fix would lead to only matching/rewriting the first "value" node. The rest of the JSON object would disappear after that rewrite and never be visited.

I now found out that the implementation for cases, where the nodup and sort options are not set, already works as I described in my last comment.

The following code runs without problem:

  jsoncons::ojson doc = jsoncons::ojson::parse(R"({"value":{"value":"long______________enough"}})");

  size_t count_rewrites {};
  jsoncons::jsonpath::json_replace(doc, "$..value",
      [&count_rewrites](const std::string&, jsoncons::ojson& match) {
        ++count_rewrites;
        match = jsoncons::ojson {"XXX"};
      },
      jsoncons::jsonpath::result_options::value);

  std::cout << "Number of rewrites: " << count_rewrites << '\n';
  std::cout << jsoncons::pretty_print(doc) << '\n';

and outputs:

Number of rewrites: 1
{
    "value": "XXX"
}

To be able to use the options argument I need to use the overload with the binary function. It would be nice to have the options available for the overload with the unary function. An overload with a mutable node reference would be nice too.

I would like to try to devise a PR that introduces a new overload of the json_replace() function template so that the following call would be possible:

  jsoncons::jsonpath::json_replace(doc, "$..value",
      [&count_rewrites](jsoncons::ojson& match) {
        ++count_rewrites;
        match = jsoncons::ojson {"XXX"};
      },
      jsoncons::jsonpath::result_options::value);

Even then, the problem remains when result_options has nodup and/or sort set. This issue should be documented.

danielaparker commented 1 year ago

Thanks for the comments. I need to give this some thought.

The binary callback version was introduced in 0.164.0, and intended to replace the unary call back version, although the unary version remains for backward compatibility. The binary version, unlike the unary version, allows selective in-place updating of the original document, using either properties of the passed value, or the location of the value in the original document.

Possibly result_options should not be made a parameter to json_replace, but set to a value that ensures safety.

wbangna commented 1 year ago

Possibly result_options should not be made a parameter to json_replace, but set to a value that ensures safety.

I think that would be a good solution.

I'm afraid that there is another problem with the json_rewrite() function:

#include <jsoncons/json.hpp>
#include <jsoncons/json_parser.hpp>
#include <jsoncons_ext/jsonpath/jsonpath.hpp>
#include <iostream>

int main()
{
  jsoncons::ojson doc = jsoncons::ojson::parse(R"({"value":"str"})");
  jsoncons::ojson rep = jsoncons::ojson::parse(R"({"value":"rew"})");

  jsoncons::jsonpath::json_replace(doc, "$..value",
      [rep](const std::string, jsoncons::ojson& match) {
        match = rep;
      }, jsoncons::jsonpath::result_options::value);

  std::cout << doc << '\n';
}

...this causes an infinit loop.

danielaparker commented 1 year ago

Okay, I won't have a chance to look at that until I get back from vacation.

danielaparker commented 1 year ago

This should be fixed on master. The fix follows your third suggestion, remove duplicates and sort by path, and loop through the selected nodes backwards, so the leaf nodes are visited first. The parameter result_options has been removed in all the json_replace overloads.