envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
24.91k stars 4.79k forks source link

make the Json::sanitize method exception free #35598

Closed wbpcode closed 1 month ago

wbpcode commented 2 months ago

This is necessary for core data path usage and envoy-mobile.

The Json::sanitize may call the Nlohmann::Factory::serialize method to sanitize string that contains special character. And the Nlohmann::Factory::serialize will construct a json object and call its dump method.

Basically, the constructor won't throw because it won't validate the input string. But the dump() will throw by default if the input string contains invalid UTF8 characters. This exception could be suppressed by setting the error_handler to the replace. Then the invalid UTF-8 sequences will be replaced by U+FFFD.

This will break exist behavior, but is a quick way to remove exception from the Json::sanitize.

std::string Factory::serialize(absl::string_view str) {
  // The constructor will not throw even if the string is not valid UTF-8
  // because the constructor will only copy the str to the internal string
  // and won't validate it.
  nlohmann::json j(str);
  return j.dump(-1, ' ', false, nlohmann::json::error_handler_t::replace);
}
wbpcode commented 2 months ago

cc @jmarantz Considering the fact that the Json::sanitize is only used by the stats_render, is this an acceptable way to continue?

jmarantz commented 2 months ago

When I wrote the sanitizer I originally did not use (or even know about) nlohmann, and I had a lot of manual code to do the same thing, which was 2x faster, and exception free. See https://github.com/envoyproxy/envoy/pull/20428.

@htuch argued not to add that code to the codebase given the existing dependence on nlohmann, and I agreed that the speedup was not worth the code complexity. And the usage of the sanitizer at the time was all in the main thread, in the admin stats endpoint, where we did allow exceptions (compiled out for mobile).

I think if we want to be exception free I think we should revive #20428.

htuch commented 2 months ago

Why can't we just do what @wbpcode suggests and use an error handler? I'm not sure I see the need to import a 1k+ code doing non-trivial sanitization if we don't have to.

jmarantz commented 2 months ago

Maybe -- @wbpcode what did you mean by "this will break existing behavior". I think I keyed on that warning.

jmarantz commented 2 months ago

Also is it enough to avoid nlohmann throwing exceptions at runtime? Or do we need to compile all code linked into E-M with exceptions disabled?

wbpcode commented 2 months ago

Maybe -- @wbpcode what did you mean by "this will break existing behavior". I think I keyed on that warning.

In current implemention, if the string contains invalid utf-8, then all non-ascii chars will be replaced with \xxx, the xxx is the number value of byte.

See

         buffer.append(absl::StrFormat("\\%03o", c));

But if we use the way I suggested, only the invalid utf-8 will be replaced with U+FFFD

I personally think this is fine and more correct and this method is used in limited scope for now.****

cc @htuch @jmarantz

jmarantz commented 2 months ago

Why is u+fffd more correct? I am not sure if it's that important that we escape invalid utf8 the way it was done, but is it enough to avoid nlohmann throwing exceptions at runtime? Or do we need to compile all code linked into E-M with exceptions disabled?

wbpcode commented 2 months ago

Why is u+fffd more correct? 

Sorry, I didn't make it clear. Not the u+fffd is more correct. But the way to handle the other legal utf-8 in a string that contains invalid utf-8 is more correct.

For a string that contains invalid utf-8, the Nlohmann json at least will handle other legal utf-8 correctly. But the current implemention will convert all non-ascii include the legal utf-8 to \xxx.

wbpcode commented 2 months ago

but is it enough to avoid nlohmann throwing exceptions at runtime?

From source code and doc, yeah. Fuzz test could be used to ensure it.

Or do we need to compile all code linked into E-M with exceptions disabled?

Yeah. I think alyssawilk has done lots (super lots) of work to ensure this point. @alyssawilk

alyssawilk commented 2 months ago

+1 to the suggested error handler. I think this is a non-problem for E-M https://github.com/envoyproxy/envoy/pull/35607 but represents a potential config validation issue for envoy core (so is worth fixing)

wbpcode commented 2 months ago

Ensure the runtime exception issue again:

dump():

Throws type_error.316 if a string stored inside the JSON value is not UTF-8 encoded and error_handler is set to strict

By setting the error_handler to replace or ignore could suppress this behavior.

And I also do a quick check to the source code.


constructor:

Depend on the to_json implementation. No other known exception will be throwed from the doc. See https://json.nlohmann.me/api/basic_json/basic_json/#exception-safety

Specific to constructing json object from string view, only the allocator may throw exception which we basically will ignore. There is the core code snippet:


// This will be called by the constructor.

template<typename BasicJsonType, typename CompatibleString,
         enable_if_t<std::is_constructible<typename BasicJsonType::string_t, CompatibleString>::value, int> = 0>
inline void to_json(BasicJsonType& j, const CompatibleString& s)
{
    external_constructor<value_t::string>::construct(j, s);
}

// This will be called by the to_json implementation for string view.
    template < typename BasicJsonType, typename CompatibleStringType,
               enable_if_t < !std::is_same<CompatibleStringType, typename BasicJsonType::string_t>::value,
                             int > = 0 >
    static void construct(BasicJsonType& j, const CompatibleStringType& str)
    {
        j.m_data.m_value.destroy(j.m_data.m_type);
        j.m_data.m_type = value_t::string;
        j.m_data.m_value.string = j.template create<typename BasicJsonType::string_t>(str);
        j.assert_invariant();
    }

// This will be called by the construct implementation for string_view

    /// helper for exception-safe object creation
    template<typename T, typename... Args>
    JSON_HEDLEY_RETURNS_NON_NULL
    static T* create(Args&& ... args)
    {
        AllocatorType<T> alloc;
        using AllocatorTraits = std::allocator_traits<AllocatorType<T>>;

        auto deleter = [&](T * obj)
        {
            AllocatorTraits::deallocate(alloc, obj, 1);
        };
        std::unique_ptr<T, decltype(deleter)> obj(AllocatorTraits::allocate(alloc, 1), deleter);
        AllocatorTraits::construct(alloc, obj.get(), std::forward<Args>(args)...);
        JSON_ASSERT(obj != nullptr);
        return obj.release();
    }
wbpcode commented 2 months ago

I have split the escaping check to a separated function and them add a new utility method escape to implement the exception free sanitizing with only 4 additional lines code.

Utility::EscapedString Utility::escape(absl::string_view str) {
  if (!requireEscaping(str)) {
    return str;
  }
  return Nlohmann::Factory::serialize(str, true);
}

The utility method will only be used by the new JsonFormatter which requires that the users to enable it explicitly by setting runtime flag. See the latest commit in the #35545.

Will this way may help to eliminate some worries about the runtime exception? cc @jmarantz

cc @htuch @alyssawilk

We can make this method to be the only one implementation after it is battle tested. For example, after the JsonFormatter completely replaced the legacy JsonFormatter.

htuch commented 2 months ago

I guess I'm still a little opposed to bringing in the extra complexity for marginal benefit. Do we have a performance problem today? The 20x is a microbenchmark which will not show up in practice (unless I'm missing some context). Meanwhile, we permanently live with our new more complicated JSON formatter and sanitizer.

Why not just replace the invalid characters and move on? This feels like a non-problem.

jmarantz commented 2 months ago

I don't know if I have gotten a specific answer to this question, which I think is for @alyssawilk .

I think @wbpcode is proposing to add the json sanitizer as a dependency in e-m which I think it was not before. json sanitizer currently links in nlohmann which includes 'throw' in its compiled code.

I understand the strategy is that we would prevent nlohmann from calling throw, dynamically. However I thought it might still be a problem to link in a dependency that compiles in a reference to throw.

jmarantz commented 2 months ago

@alyssawilk you said "potential config validation issue for envoy core (so is worth fixing)". I did not undestand that phrase. Can you explain?

wbpcode commented 2 months ago

I guess I'm still a little opposed to bringing in the extra complexity for marginal benefit. Do we have a performance problem today? The 20x is a microbenchmark which will not show up in practice (unless I'm missing some context). Meanwhile, we permanently live with our new more complicated JSON formatter and sanitizer.

Why not just replace the invalid characters and move on? This feels like a non-problem.

Sorry, I think maybe I don't make it clear enough.

The core of #35545 is a new JsonFormatter implemention for access logging, with completely new design. It's 20x faster (and I think the code is also much simpler) than our legacy one and provide same performance with our text formatter. The poor performance of legacy JsonFormatter actually effects our users (see #35501, an user report similar problem and profiling result in that issue).

An exception free sanitize/escape is necessary for #35545 (our new JsonFormatter) because the new JsonFormatter will be used at core data paths and will be linked to e-m. But exist sanitize/escape depends on std:: exception directly. It doesn't meet my requirement. So I created this issue to discuss this problem. But note, these methods is not the core part of #35545.

My solution now is creat a new exception-free sanitize/escape method by replacing the invalid utf-8 (as my inital suggestion in this issue) for our new JsonFormatter usage. The new method only bring 4 additional lines code and is much simpler than previous method because we needn't to handle exception. I don't think it will bring extract complexity. It will simplify our code because the previous sanitize/escape will be replaced once the new one is proven that is stable.

So, I think you may misunderstood something (sorry for my previous unclear description.)

Cc @htuch

jmarantz commented 2 months ago

@htuch @alyssawilk

the context here is that @wbpcode is adding a new dependence on JsonSanitizer that requires it to be linked in and functional in envoy-mobile. Hence he is trying to remove the exception flow.

The benefit of this is in #35501 which is looking for a 20x improvement in json serialization during access loggers. I'm very much aligned with this -- I was able to get a huge improvement in json stats output before because the default model of populating a protobuf and then serializing that is a bag full of slow. We are iterating on some of the details of course.

I'm assuming this is necessary because json access loggers are needed in envoy-mobile; if they are not then maybe this change can be skipped.

alyssawilk commented 2 months ago

ah. so I think adding more exceptions to envoy is a poor choice given we're trying to move away from them due to the number of times we've found configs and queries of death where they were used improperly. if we can get the speed up without exceptions that makes it a clear win. all of that said envoy mobile doesn't use a json access logger AFIK (mobile/envoy_build_config/extensions_build_config.bzl) @wbpcode can verify by building //test/performance:test_binary_size in mobile - if it builds then there's a test-only dependency which can be easily sorted

htuch commented 2 months ago

OK, @wbpcode I think I see the reasonableness of the optimization in this use case, thanks for sharing. My vote is for whatever keeps exceptions from occurring (less opinionated about compile time, but sure) and whatever is simplest in implementation that achieves the end goal.

The one concern I have in all this code is it's incredibly fragile and easy to get wrong (enforcing sanitization, parsing, ensuring schema or valid JSON syntax on the way out), mistakes are security bugs in many cases, even for access logs.

github-actions[bot] commented 1 month ago

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

github-actions[bot] commented 1 month ago

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.

jmarantz commented 1 month ago

Actually this was fixed in #36000