TykTechnologies / tyk

Tyk Open Source API Gateway written in Go, supporting REST, GraphQL, TCP and gRPC protocols
Other
9.72k stars 1.09k forks source link

Tyk should have an option to shutdown in case a Go native plugin is not loaded #5319

Open diegobernardes opened 1 year ago

diegobernardes commented 1 year ago

Is your feature request related to a problem? Please describe. We're using Go native plugins to implement critical logic that should be executed before hitting any of the internal services. The problem with Go plugins is that they can fail to load but Tyk will simply start without them.

Describe the solution you'd like An option at the configuration file like this:

{
  "shutdownOnFailedGoNativePluginLoad": true
}

If true and the plugin fails to load, tyk should shutdown.

Describe alternatives you've considered I've implemented a proxy application that gets executed instead of tyk and then the application starts tyk and parse the logs during the startup. If it founds a log entry with Could not load Go-plugin it kills Tyk and exit. This ensures that Tyk is always executing with the plugin.

Additional context This is the proxy application that I've implemented.

package main

import (
    "bufio"
    "fmt"
    "os"
    "os/exec"
    "os/signal"
    "regexp"
    "syscall"
    "time"
)

func main() {
    cmd := exec.Command("./tyk")
    cmdReader, err := cmd.StdoutPipe()
    if err != nil {
        fmt.Fprintln(os.Stderr, "failed to create the pip to execute the command: %w", err)
        os.Exit(1)
    }
    cmd.Stderr = cmd.Stdout

    scanner := bufio.NewScanner(cmdReader)
    go func() {
        checkLog := true
        t := time.Now().Add(time.Minute)
        r, err := regexp.Compile("Could not load Go-plugin")
        if err != nil {
            fmt.Fprintln(os.Stderr, "failed to compile the regex: %w", err)
            os.Exit(1)
        }
        for scanner.Scan() {
            fmt.Fprintln(os.Stdout, scanner.Text())

            // This error happens just when tyk is starting so we don't need to analize the logs forever.
            if checkLog {
                if r.Find(scanner.Bytes()) != nil {
                    fmt.Fprintln(os.Stderr, "tyk failed to load the plugin")
                    os.Exit(1)
                }
                if time.Now().After(t) {
                    checkLog = false
                }
            }
        }
    }()

    if err := cmd.Start(); err != nil {
        fmt.Fprintln(os.Stderr, "Error starting Cmd", err)
        os.Exit(1)
    }

    c := make(chan os.Signal, 1)
    signal.Notify(c, os.Interrupt, syscall.SIGTERM)
    <-c
    if err := cmd.Process.Signal(os.Interrupt); err != nil {
        fmt.Fprintln(os.Stderr, "Failed to send a signal to stop the process: %w", err)
        os.Exit(1)
    }

    if err := cmd.Wait(); err != nil {
        fmt.Fprintln(os.Stderr, "Error waiting for Cmd", err)
        os.Exit(1)
    }
}
andyo-tyk commented 1 year ago

Hi @diegobernardes,

Thanks for getting in touch and taking the time to contribute your idea.

In Tyk 4.0.13 and 5.0.1 we introduced a change so that, if a Go Plugin fails to load correctly, Tyk will log an error but will continue to operate normally.

If a call is made to the API where the Go Plugin failed to load, then Tyk will return HTTP 500 and will not proxy the request upstream.

We felt that this was an appropriate behaviour for the Gateway - it protects your upstream service while not impacting your other APIs. By monitoring the error log during API load, you'd spot any incorrectly loaded plugins.

We've also, in v5.0.3, added the same behaviour for JS plugins.

https://github.com/TykTechnologies/tyk/releases/tag/v4.0.13 https://github.com/TykTechnologies/tyk/releases/tag/v5.0.1

I hope this helps - thanks again for supporting Tyk!

diegobernardes commented 1 year ago

Hi @diegobernardes,

Thanks for getting in touch and taking the time to contribute your idea.

In Tyk 4.0.13 and 5.0.1 we introduced a change so that, if a Go Plugin fails to load correctly, Tyk will log an error but will continue to operate normally.

If a call is made to the API where the Go Plugin failed to load, then Tyk will return HTTP 500 and will not proxy the request upstream.

We felt that this was an appropriate behaviour for the Gateway - it protects your upstream service while not impacting your other APIs. By monitoring the error log during API load, you'd spot any incorrectly loaded plugins.

We've also, in v5.0.3, added the same behaviour for JS plugins.

https://github.com/TykTechnologies/tyk/releases/tag/v4.0.13 https://github.com/TykTechnologies/tyk/releases/tag/v5.0.1

I hope this helps - thanks again for supporting Tyk!

But do you think that the startup check is still an option? You can have Tyk running non stop and update it as the API definitions change from the management console or the k8s operator, but you can also, and that's our case, have a centralized repository containing the API definitions and when anything changes, redeploy Tyk. The problem with the 500 is that we're pushing to runtime something that can be validated, at some cases, during the startup. Another problem with the runtime checks is that once you have enough endpoints in Tyk you'll find that some have way more throughput then others and it starts to get harder to create proper alerts because 1k requests failing on a given endpoint is nothing, but on another could represent a 50% failure rate.

buger commented 1 year ago

If this API with Go plugin is very important for you, then you can change the health check of your application group/pod group to point to that API directly. E.g. if API has not loaded and not respond properly, healthcheck has failed.

andyo-tyk commented 12 months ago

Hi @diegobernardes,

Did you manage to try @buger's suggestion of using the healthcheck?

I'm interested to understand the use case further as to why you prefer to redeploy the Gateway (and tear it down again if a single Plugin fails to load).

Do you find it more efficient to re-deploy Tyk for each change, rather than using Tyk's hot-reload functionality?

diegobernardes commented 9 months ago

Hi @diegobernardes,

Did you manage to try @buger's suggestion of using the healthcheck?

I'm interested to understand the use case further as to why you prefer to redeploy the Gateway (and tear it down again if a single Plugin fails to load).

Do you find it more efficient to re-deploy Tyk for each change, rather than using Tyk's hot-reload functionality?

Hey, sorry for the late reply, totally missed the notification from this issue. Yes, I changed to what @buger has suggested, but I still think that this should be an option at tyk and not something that we, as clients, need to work around. I've created a virtual endpoint at the Go plugin and mapped it as a health endpoint that returns a 200.

Regards to why we redeploy instead of hot-reload: