TykTechnologies / tyk

Tyk Open Source API Gateway written in Go, supporting REST, GraphQL, TCP and gRPC protocols
Other
9.58k stars 1.07k forks source link

[TT-12965] `ctx.GetOASDefinition()` has very poor performance #6471

Open JanMa opened 3 weeks ago

JanMa commented 3 weeks ago

Branch/Environment/Version

Describe the bug To work around some limitations in Tyks mTLS support (see #6259), I am maintaining a custom go plugin for mTLS at Paymenttools. We recently enhanced that plugin to support OAS APIs as well and noticed a large performance regression.

To get custom per-api plugin configuration at runtime, the plugin calls ctx.GetOASDefinition on the context object provided to the plugin. This function performs a deep copy of the stored *oas.OAS object before returning it which first marshals and then unmarshals the object to JSON.

This turns out to be a very expensive operation and severely degrades performance of your plugin if the function is called in the hot path for every incoming request. By removing this function and manually getting the *oas.OAS pointer from the passed context, we could reduce our memory usage by 95% and CPU usage by 46% which severely improved the overall performance of our Tyk setup.

Below you can see two generated flame graphs which show the difference (please ignore the color inconsistency).

Memory: mem_flamegraph

CPU: cpu_flamegraph

Proposed solution

We've been running this fix for some time now in production and could not notice any negative side effects which might be explained by not operating on a deep copy of the passed OAS definition. Therefore I propose to remove the unneeded deep copy of the *oas.OAS object before returning it.

buger commented 3 weeks ago

Hi! I wonder if you can overcome it by having in memory cache inside plugin. When api is reloaded your plugin will be reloaded in any case. What do you think?

JanMa commented 2 weeks ago

Hi @buger, we already make heavy use of caching for the authentication part of our plugin. The cache is initialized in the plugin's init function and the high level workflow is as follows:

  1. get OAS definition from passed context
  2. fetch API name and plugin config data
  3. check cache if the key <api name>-<client cert hash> already exists
    1. if yes, allow request
  4. perform client cert validation with config from fetched plugin config data
    1. if cert is valid, add key to cache and allow request

We use the same plugin for multiple APIs so we need it to be configurable and make use of the plugin config data. I guess we could work around fetching the OAS definition only once and then caching it based on some other value which we know without it :thinking: .

But this would mean we can't ensure that a new plugin config is immediately applied when the API config is changed in Tyk. Because we share the same plugin with multiple APIs, the plugin's init function is only triggered the first time it is loaded.

buger commented 2 weeks ago

When api is changed on Tyk side, your plugin is reloaded with api. So its cache kind will be reloaded as well.

blagerweij commented 1 week ago

@JanMa we noticed a similar performance hit in our plugin, out of curiosity how have you been able to work around the issue? Something like this ?

    v := r.Context().Value(ctx.OASDefinition)
    var apiDef *oas.OAS = nil
    if v != nil {
        apiDef = v.(*oas.OAS)
    }

And is your Go plugin an 'Auth' plugin, and are you using the Session state to cache the user, or are you using a custom cache?

JanMa commented 1 week ago

Hi @blagerweij , we worked around the issue like this:

    v := r.Context().Value(ctx.OASDefinition)
    if v == nil {
        w.WriteHeader(http.StatusInternalServerError)
        return
    }

    apidef, ok := v.(*oas.OAS)
    if !ok {
        w.WriteHeader(http.StatusInternalServerError)
        return
    }

The plugin is an Auth plugin, but for mutual TLS. This means we can't make use of the ID Extractor mechanism provided by Tyk because we have no auth information which is passed in a header. Instead we just use a go-cache cache which the plugin initializes in it's init() function.