danielaparker / jsoncons

A C++, header-only library for constructing JSON and JSON-like data formats, with JSON Pointer, JSON Patch, JSON Schema, JSONPath, JMESPath, CSV, MessagePack, CBOR, BSON, UBJSON
https://danielaparker.github.io/jsoncons
Other
700 stars 158 forks source link

Replace in JSON string #251

Closed rbroggi closed 4 years ago

rbroggi commented 4 years ago

Dear Daniel,

Thank you once again for the product, my team loves it and we are heavily using it on our project.

We are currently facing a challenge and I would like to ask you for an advice on how to go about it using jsoncons:

  1. We have a JSON string.
  2. We are using JSON path to find a specific item on our JSON.
  3. [HERE is the challenge] - we need to replace in the original string, the found item (string) with another string having the same size without modifying anything else in the original string.

We want to avoid finding the item and then replacing in the original string using string find and replace because the found item can be a very small string (e.g. "abc") and replacing seamlessly after finding could lead to several false positives.

Example:

Considering the following JSON string :

{"items":[{"id":1,"name":"abc","expiry":"0420"},{"id":2,"name":"id","expiry":"0720"}]}

and the following JSONPath: $.items[1].name

I would be retrieving the string "id".

I would like to substitute the found string "id" with the string "ab" in the original message without modifying anything else about it generating the following:

{"items":[{"id":1,"name":"abc","expiry":"0420"},{"id":2,"name":"ab","expiry":"0720"}]}

If I was to seamlessly find and replace in the raw string "id" with "ab" it would lead to the following string:

{"items":[{"ab":1,"name":"abc","expiry":"0420"},{"ab":2,"name":"ab","expiry":"0720"}]}

How would you go about this problem? Thank you once again.

danielaparker commented 4 years ago

Does it have to be an in-place update? If not, you could use a json filter, e.g.

class my_filter : public jsoncons::json_filter
{
    using jsoncons::json_filter::string_view_type;

    std::string from_;
    std::string to_;
public:
    my_filter(jsoncons::json_visitor& destination)
        : jsoncons::json_filter(destination), from_("id"), to_("ab")
    {

    }
    bool visit_string(const string_view_type& value,
                      jsoncons::semantic_tag tag,
                      const jsoncons::ser_context& context,
                      std::error_code& ec) override
    {
        if (value == from_)
        {
            return this->destination().string_value(to_,
                                                    tag,
                                                    context,
                                                    ec);
        }
        else
        {
            return this->destination().string_value(value,
                                                    tag,
                                                    context,
                                                    ec);
        }
    }
};

int main()
{
    std::string input = R"({"items": [{"id":1, "name" : "abc", "expiry" : "0420"}, { "id":2,"name" : "id","expiry" : "0720" }] })";

    std::string output;
    jsoncons::compact_json_string_encoder encoder(output);
    my_filter filter(encoder);

    jsoncons::json_reader reader(input, filter);
    reader.read();
    std::cout << output << "\n";
}

Output:

{"items":[{"id":1,"name":"abc","expiry":"0420"},{"id":2,"name":"ab","expiry":"0720"}]}

Or alternatively, and consistent with using JSONPath to find the item, jsoncons::jsonpath::json_replace.

But if it must be in-place, and this will currently work only if your json text contains no unescaped newlines, you could try this:

class my_in_place_updater : public jsoncons::default_json_visitor
{
    char* data_;
    std::size_t length_;
    std::string from_;
    std::string to_;
public:
    using jsoncons::default_json_visitor::string_view_type;

    my_in_place_updater(char* data, std::size_t length)
        : data_(data), length_(length), from_("id"), to_("ab")
    {

    }
    bool visit_string(const string_view_type& value,
                      jsoncons::semantic_tag,
                      const jsoncons::ser_context& context,
                      std::error_code& ec) override
    {
        // input cannot contain unescaped newlines
        if (context.line() != 1)
        {
            ec = jsoncons::json_errc::source_error;
            return false;
        }

        if (value == from_)
        {
            std::copy(to_.begin(), to_.end(), data_ + (context.column() - 1));
        }
        return true;
    }
};

int main()
{
    std::string input = R"({"items": [{"id":1, "name" : "abc", "expiry" : "0420"}, { "id":2,"name" : "id","expiry" : "0720" }] })";

    my_in_place_updater updater(input.data(), input.size());

    jsoncons::json_reader reader(input, updater);
    reader.read();
    std::cout << input << "\n";
}

Output:

{"items": [{"id":1, "name" : "abc", "expiry" : "0420"}, { "id":2,"name" : "aa","expiry" : "0720" }] }

Using context.column() - 1 to get the current position will work only if the JSON text doesn't have unescaped newlines, so that context.line() == 1. If users thought it useful, we could add a position() member function to the ser_context, which wouldn't have that limitation.

rbroggi commented 4 years ago

Hey, Daniel,

That's awesome! This lib is really a perl. I think the 'position()' method could be a nice idea for user experience. Out of curiosity, why is it that only 'unescaped newlines' limits the depicted approach? Other regular space characters would not interfere at all?

Thanks once again, Cheers from rainy Milan

danielaparker commented 4 years ago

By newline I mean any of \r, \r\n or \n occurring in the JSON text. The parser keeps track of line and column number, which are one-based. column-1 only corresponds to position if line == 1. If we were to support a position() member function on the ser_context, it wouldn't have that limitation. Note that escaped newline characters inside quoted strings aren't a problem. And regular space characters aren't a problem either, column counts them too.

rbroggi commented 4 years ago

In your proposal of having the 'position ()' method, do you think it would be possible to have it as the real position of the cursor, taking into account characters from previous lines? That would be exactly what we would need.

danielaparker commented 4 years ago

Yes.

danielaparker commented 4 years ago

Update JSON text in-place with position() (supported in v0.153.3)

#include <iostream>
#include <jsoncons/json.hpp>
#include <cassert>

class my_in_place_updater : public jsoncons::default_json_visitor
{
    char* data_;
    std::size_t length_;
    std::string from_;
    std::string to_;
public:
    using jsoncons::default_json_visitor::string_view_type;

    my_in_place_updater(char* data, std::size_t length, 
                        const std::string& from, const std::string& to)
        : data_(data), length_(length), 
          from_(from), to_(to)
    {

    }
    bool visit_string(const string_view_type& value,
                      jsoncons::semantic_tag,
                      const jsoncons::ser_context& context,
                      std::error_code&) override
    {
        assert(context.position() + to_.size() < length_);
        if (value == from_)
        {
            std::copy(to_.begin(), to_.end(), data_ + context.position());
        }
        return true;
    }
};

int main()
{
    std::string format = "{\"items\"\n:\n \n[\n{\"id\"\n:\n1\n,\n \"name\" \n:\n \"abc\"\n,\n \"expiry\" \n:\n \"0420\"\n}\n\n,\n { \"id\"\n:\n2\n,\n\"name\" \n:\n \"%s\"\n,\n\"expiry\" \n:\n \"0720\" \n}\n\n]\n \n}";

    char input[500];
    int length = snprintf(input, 500, format.c_str(), "id");
    //std::cout << "(1)\n" << input << "\n";
    char expected[500];
    snprintf(expected, 500, format.c_str(), "ab");

    my_in_place_updater updater(input, (size_t)length, "id", "ab");
    jsoncons::json_reader reader(jsoncons::string_view(input), updater);
    reader.read();
    assert(std::string(input) == std::string(expected));
}
Alexandre-Dhalenne commented 4 years ago

Hello,

I am wondering how this could work with a jsonPath. Let me explain : How can I replace in the JSON string a value corresponding to a JSON Path without modifying anything else in the original string ?

For instance, if I have this JSON :

{
"Cola":{

"Type":"Drink"
},
"Water":{

"Type":"Drink"

}

}

I want to modify the type of cola to SoftDrink. The JSON Path : $.Cola.Type

Is there a way to get the ser_context from the json_query ? So I get the position() and I can replace exactly where I want. The goal is to keep the original string untouched (this is why I don't want to use json_replace)

Thanks,

danielaparker commented 4 years ago

Currently, the JSONPath implementation requires decoding the JSON into a basic_json value before evaluating the JSONPath expression, and at that point all information about position has been lost. So that's not currently possible.

At some point I'd like to implement a streaming mode of processing, which would retain position, but that's some way off.

danielaparker commented 4 years ago

I guess you could do it like this (delay modifying the input until the parser has read past the right brace):

#include <iostream>
#include <jsoncons/json.hpp>
#include <cassert>

using namespace jsoncons;

class my_in_place_updater : public jsoncons::default_json_visitor
{
    char* data_;
    std::size_t length_;
    std::vector<std::string> path_;
    std::string from_;
    std::string to_;
    std::vector<std::string> current_;
    std::vector<std::size_t> positions_;
public:
    using jsoncons::default_json_visitor::string_view_type;

    my_in_place_updater(char* data, std::size_t length,
                        const std::vector<std::string>& path, 
                        const std::string& from, const std::string& to)
        : data_(data), length_(length),
          path_(path), from_(from), to_(to)
    {
    }

    bool visit_begin_object(semantic_tag, const ser_context&, std::error_code&) override
    {
        current_.emplace_back();
        return true;
    }

    bool visit_end_object(const ser_context&, std::error_code&) override
    {
        for (auto pos : positions_)
        {
            std::copy(to_.begin(), to_.end(), data_ + pos);
        }
        positions_.clear();
        current_.pop_back();
        return true;
    }

    bool visit_key(const string_view_type& key, const ser_context&, std::error_code&) override
    {
        current_.back() = key;
        return true;
    }

    bool visit_string(const string_view_type& value,
        jsoncons::semantic_tag,
        const jsoncons::ser_context& context,
        std::error_code&) override
    {
        if (path_ == current_ && value == from_)
        {
            assert(context.position() + to_.size() < length_);
            positions_.push_back(context.position());

        }
        return true;
    }
};

int main()
{
    std::string input = R"(
{
    "Cola" : {"Type":"Drink"      },"Water" : {"Type":"Drink"}
}
    )";

    try
    {
        my_in_place_updater updater(input.data(), input.size(), { "Cola","Type" }, "Drink", "SoftDrink\"");
        jsoncons::json_reader reader(jsoncons::string_view(input), updater);
        reader.read();

        std::cout << input << "\n";
    }
    catch (std::exception& e)
    {
        std::cout << e.what() << "\n";
    }
}

Output:

{
    "Cola" : {"Type":"SoftDrink"  },"Water" : {"Type":"Drink"}
}
Alexandre-Dhalenne commented 4 years ago

Wow thanks a lot ! I will try it right away.

danielaparker commented 4 years ago

But this way is much safer: using a visitor to collect the positions of "from" in the input, and then using string replace to make the substitution. It doesn't rely on there being a fortuitous amount of space available in the input.

#include <iostream>
#include <jsoncons/json.hpp>
#include <cassert>

using namespace jsoncons;

class string_locator : public jsoncons::default_json_visitor
{
    char* data_;
    std::size_t length_;
    std::vector<std::string> path_;
    std::string from_;
    std::vector<std::string> current_;
    std::vector<std::size_t>& positions_;
public:
    using jsoncons::default_json_visitor::string_view_type;

    string_locator(char* data, std::size_t length,
                        const std::vector<std::string>& path, 
                        const std::string& from, std::vector<std::size_t>& positions)
        : data_(data), length_(length),
          path_(path), from_(from), positions_(positions)
    {
    }

    bool visit_begin_object(semantic_tag, const ser_context&, std::error_code&) override
    {
        current_.emplace_back();
        return true;
    }

    bool visit_end_object(const ser_context&, std::error_code&) override
    {
        current_.pop_back();
        return true;
    }

    bool visit_key(const string_view_type& key, const ser_context&, std::error_code&) override
    {
        current_.back() = key;
        return true;
    }

    bool visit_string(const string_view_type& value,
        jsoncons::semantic_tag,
        const jsoncons::ser_context& context,
        std::error_code&) override
    {
        if (path_ == current_ && value == from_)
        {
            positions_.push_back(context.position());

        }
        return true;
    }
};

void update_in_place(std::string& input,
    const std::vector<std::string>& path,
    const std::string& from,
    const std::string& to)
{
    std::vector<std::size_t> positions;
    string_locator updater(input.data(), input.size(), path, from, positions);
    jsoncons::json_reader reader(jsoncons::string_view(input), updater);
    reader.read();

    for (auto it = positions.rbegin(); it != positions.rend(); ++it)
    {
        input.replace(*it, from.size(), to);
    }
}

int main()
{
    std::string input = R"(
{
    "Cola" : {"Type":"Drink"},"Water" : {"Type":"Drink"}
}
    )";

    try
    {
        update_in_place(input, {"Cola", "Type"}, "Drink", "SoftDrink");

        std::cout << input << "\n";
    }
    catch (std::exception& e)
    {
        std::cout << e.what() << "\n";
    }
}

Output:

{
    "Cola" : {"Type":"SoftDrink"},"Water" : {"Type":"Drink"}
}
danielaparker commented 4 years ago

And here's a version that supports relative paths:

#include <iostream>
#include <jsoncons/json.hpp>
#include <cassert>

using namespace jsoncons;

class string_locator : public jsoncons::default_json_visitor
{
    char* data_;
    std::size_t length_;
    std::vector<std::string> path_;
    std::string from_;
    std::vector<std::string> current_;
    std::vector<std::size_t>& positions_;
public:
    using jsoncons::default_json_visitor::string_view_type;

    string_locator(char* data, std::size_t length,
                        const std::vector<std::string>& path, 
                        const std::string& from, std::vector<std::size_t>& positions)
        : data_(data), length_(length),
          path_(path), from_(from), positions_(positions)
    {
    }

    bool visit_begin_object(semantic_tag, const ser_context&, std::error_code&) override
    {
        current_.emplace_back();
        return true;
    }

    bool visit_end_object(const ser_context&, std::error_code&) override
    {
        current_.pop_back();
        return true;
    }

    bool visit_key(const string_view_type& key, const ser_context&, std::error_code&) override
    {
        current_.back() = key;
        return true;
    }

    bool visit_string(const string_view_type& value,
                      jsoncons::semantic_tag,
                      const jsoncons::ser_context& context,
                      std::error_code&) override
    {
        if (path_.size() <= current_.size() && std::equal(path_.rbegin(), path_.rend(), current_.rbegin()))
        {
            if (value == from_)
            {
                positions_.push_back(context.position());

            }
        }
        return true;
    }
};

void update_in_place(std::string& input,
                     const std::vector<std::string>& path,
                     const std::string& from,
                     const std::string& to)
{
    std::vector<std::size_t> positions;
    string_locator updater(input.data(), input.size(), path, from, positions);
    jsoncons::json_reader reader(jsoncons::string_view(input), updater);
    reader.read();

    for (auto it = positions.rbegin(); it != positions.rend(); ++it)
    {
        input.replace(*it, from.size(), to);
    }
}

int main()
{
    std::string input = R"(
{
    "Cola" : {"Type":"Drink"},"Water" : {"Type":"Drink"}, "Extra" : {"Cola" : {"Type":"Drink"}}
}
    )";

    try
    {
        update_in_place(input, {"Cola", "Type"}, "Drink", "SoftDrink");

        std::cout << input << "\n";
    }
    catch (std::exception& e)
    {
        std::cout << e.what() << "\n";
    }
}

Output:

{
    "Cola" : {"Type":"SoftDrink"},"Water" : {"Type":"Drink"}, "Extra" : {"Cola" : {"Type":"SoftDrink"}}
}
Alexandre-Dhalenne commented 4 years ago

Hey,

So after some work I managed to do what I want. Thanks to your solution. In fact, to be able retrieve the position of an element using the JSON path, I use normalized path. I build the normalize path as I am parsing the JSON, and when I read a value I check if the value we are reading is at this normalized path. If yes, I store the position. Source :

using namespace jsoncons;

class string_locator : public jsoncons::default_json_visitor
{
    char* data_;
    std::size_t length_;
    std::string path_;
    std::string from_;
    std::vector<std::string> current_;
    std::vector<std::size_t>& positions_;

    std::vector < std::pair<int,int>> arrayIndexes; //Position in current_, value

public:
    using jsoncons::default_json_visitor::string_view_type;

    string_locator(char* data, std::size_t length,
        const std::string& path,
        const std::string& from, std::vector<std::size_t>& positions)
        : data_(data), length_(length),
        path_(path),
        from_(from),
        positions_(positions)
    {
    }

    std::string buildNormalizedPath(const std::vector<std::string>& iKeyList)
    {
        //Init
        std::string aNormalizedPath = "$";

        //For each key in the current stack
        for (auto& key : iKeyList)
        {
            aNormalizedPath += "[" + key + "]";
        }
        return aNormalizedPath;
    }

    bool custom_visit(const ser_context& context)
    {
        std::string aNormPath;
        if (arrayIndexes.size() > 0)
        {
            auto& [pos, val] = arrayIndexes.back();
            current_.at(pos) = std::to_string(val);
            aNormPath = buildNormalizedPath(current_);
            val += 1;
        }
        else
        {
            aNormPath = buildNormalizedPath(current_);
        }
        std::cout << aNormPath << std::endl;
        if (path_ == aNormPath)
        {
            positions_.push_back(context.position());
        }
        return true;
    }

    bool visit_begin_object(semantic_tag, const ser_context&, std::error_code&) override
    {
        current_.emplace_back();
        return true;
    }

    bool visit_end_object(const ser_context&, std::error_code&) override
    {
        current_.pop_back();
        return true;
    }

    bool visit_key(const string_view_type& key, const ser_context&, std::error_code&) override
    {
        current_.back() = "'"+std::string(key)+"'";
        return true;
    }

    bool visit_string(const string_view_type& value,
        jsoncons::semantic_tag,
        const jsoncons::ser_context& context,
        std::error_code&) override
    {
        return custom_visit(context);
    }

    bool visit_begin_array(semantic_tag, const ser_context&, std::error_code& ec) override
    {
        current_.emplace_back(std::to_string(0));
        arrayIndexes.emplace_back(std::make_pair(current_.size()-1,0));
        return true;
    }

    bool visit_end_array(const ser_context&, std::error_code& ec) override
    {
        current_.pop_back();
        arrayIndexes.pop_back();
        return true;
    }

    bool visit_null(semantic_tag, const ser_context&, std::error_code& ec) override
    {
        return true;
    }

    bool visit_byte_string(const byte_string_view&, semantic_tag, const ser_context& context, std::error_code& ec) override
    {
        return custom_visit(context);
    }

    bool visit_uint64(uint64_t, semantic_tag, const ser_context& context, std::error_code& ec) override
    {
        return custom_visit(context);
    }

    bool visit_int64(int64_t, semantic_tag, const ser_context& context, std::error_code& ec) override
    {
        return custom_visit(context);
    }

    bool visit_half(uint16_t, semantic_tag, const ser_context& context, std::error_code& ec) override
    {
        return custom_visit(context);
    }

    bool visit_double(double, semantic_tag, const ser_context& context, std::error_code& ec) override
    {
        return custom_visit(context);
    }

    bool visit_bool(bool, semantic_tag, const ser_context& context, std::error_code& ec) override
    {
        return custom_visit(context);
    }
};

void update_in_place(std::string& input,
    const std::string& path,
    const std::string& from,
    const std::string& to)
{
    std::vector<std::size_t> positions;
    string_locator updater(input.data(), input.size(), path, from, positions);
    jsoncons::json_reader reader(jsoncons::string_view(input), updater);
    reader.read();

    for (auto it = positions.rbegin(); it != positions.rend(); ++it)
    {
        std::cout << "Position : " << *it << std::endl;
        input.replace(*it, from.size(), to);
    }
}

int main()
{

    std::string input = R"(
    {
        "Cola" : {"Type":"Drink","List":[{"Child":"A","children":["TEST"]},{"Child":"B"}]},"Water" : {"Type":"Drink"}
    }
    )";

    try
    {
        update_in_place(input, "$['Cola']['List'][0]['Child']", "A", "SoftDrink");
        //std::cout << input << std::endl;
        auto myJson = json::parse(input);
        auto results = jsoncons::jsonpath::flatten(myJson);
        std::cout << (input) << "\n" << std::endl;
    }
    catch (std::exception& e)
    {
        std::cout << e.what() << "\n";
    }

} 

Output :

{
        "Cola" : {"Type":"Drink","List":[{"Child":"SoftDrink","children":["TEST"]},{"Child":"B"}]},"Water" : {"Type":"Drink"}
    }

Thanks a lot for your work. I hope this can help too. And thanks @rbroggi for his contribution, he helped me find a solution 😄

Cheers

danielaparker commented 4 years ago

Nice work! Just FYI, the visit_half and visit_byte_string events will never be generated by a JSON parser, so you don't need to handle those.