Add integration tests for Worker

cognifloyd commented 2 years ago

Description

I keep tripping over myself as I'm developing the Kubernetes runtime in the Worker. I would like to find a way to test a lot more of the Worker in CI. The current unit tests don't give me confidence that everything works together.

So, let's add integration (or system, or whatever name you like) tests:

these tests should be able to run in CI or locally.
these tests are not end-to-end test that requires setting up server and worker
the scope of these tests should be limited to the worker, which includes executor and runtime.
the tests need to run on a matrix of possible executor+runtime configurations:
- linux + docker
- linux + kubernetes
- local + docker
- local + kubernetes

Value

Easier to compare runtimes or executors. Simplify developing new worker features. Increase confidence in the worker as a whole.

Definition of Done

We have tests running in CI in the worker repo that test each combo of executor+runtime.

Effort (Optional)

Probably a lot, but less than the effort I've expended back-tracking to fix what I keep breaking.

Impacted Personas (Optional)

Vela devs

cognifloyd commented 2 years ago

I think the exec() function is a good target for these integration tests.

func (w *Worker) exec(index int) error {

This is where we retrieve a build from the Queue, and then run the executor engine to do the build.

We will need:

Queue mock
- github.com/go-vela/server/queue/redis redis.NewTest()
- NewTest starts a local fake redis instance (miniredis)
- example usage: https://github.com/go-vela/server/blob/master/router/middleware/queue_test.go
Vela server mock
- github.com/go-vela/server/mock/server server.FakeHandler()
- FakeHandler returns an http.Handler that handles the Vela API and returns mock responses
- these responses are hard-coded
- the mock should probably be extended so we can configure it for different scenarios in server.FakeHandler()
- examples are in many of the executor tests: httptest.NewServer(server.FakeHandler())
mock clients in runtime (docker client, kubernetes client(s))

This is the Queue Item: https://github.com/go-vela/types/blob/master/item.go#L12-L18

// Item is the queue representation of an item to publish to the queue.
type Item struct {
    Build    *library.Build  `json:"build"`
    Pipeline *pipeline.Build `json:"pipeline"`
    Repo     *library.Repo   `json:"repo"`
    User     *library.User   `json:"user"`
}

These are the Vela Server APIs used in the Worker's executor and internal packages:

.Build.Update the mock should return the PUT body based on usage
.Log.GetService
.Log.GetStep
.Log.UpdateService the mock should return the PUT body based on usage
.Log.UpdateStep the mock should return the PUT body based on usage
.Secret.Get
.Step.Update the mock should return the PUT body based on usage
.Svc.Update the mock should return the PUT body based on usage

These Vela Server APIs are also used in the worker, but these tests probably won't need to handle them as they are only used in cmd/vela-worker/register.go, not in exec().

.Worker.Add
.Worker.Get
.Worker.Update

cognifloyd commented 2 years ago

OK. The biggest problem with testing exec() is that exec() initializes w.Runtime(), so we don't have a chance to inject mock runtime clients.

cognifloyd commented 2 years ago

Hmm. I do not see a good way to test that the build has completed. With the linux executor, I could use Vela server calls by adding some kind of tracking to the vela server mock. But the local executor does not report progress to the Vela server, so testing that is more difficult.

cognifloyd commented 2 years ago

We may need to finish extracting the things from the worker's main package into a new operator package.

As I understand, each worker has one operator. That operator is responsible for creating, tracking, and orchestrating all of the executors. (creating the executor also involves creating the runtime that is passed to the executor)

Doing this would make some of the testing a little more straight forward as mock's wouldn't have to be made quite so indirectly.

(based on feedback from @jbrockopp)

cognifloyd commented 2 years ago

here is a rough outline of what the Operator interface might look like. @jbrockopp wdyt?

type Operator interface {
    // replaces Worker.operate(), called in Worker.Start()
    Operate(context.Context)         error

    // called in Operate()
    CheckInLoop(context.Context)     error

    // replaces Worker.exec(), called in Operate()
    GetNextBuild()                   *types.Item
    CreateExecutor(int, *types.Item) error
    RunExecutor(int)                 error

    // called via api/middleware
    GetExecutors()                   (map[int]*executor.Engine, error)
    // called via api/middleware and in RunExecutor()
    GetExecutor(int)                 (*executor.Engine, error)
}

// implements Operator
type operator struct {...}

func (o *operator) Operate(ctx context.Context) error {
    executors, gctx := errgroup.WithContext(ctx)

    executors.Go(o.CheckInLoop)

    for i := 0; i < o.Config.Build.Limit; i++ {
        id := i
        executors.Go(func() error {
            select {
            case <-ctx.Done():
                return nil
            default:

                item, err := o.GetNextBuild()
                err := o.CreateExecutor(id, item)
                err := o.RunExecutor(id)
            }
        })
    }

    return executors.Wait()
}

Instead of adding "executors" to the gin context, we would add "operator", and then

call operator.GetExecutor(number) in router/middleware/executor/executor.go
call operator.GetExecutors() in api/executor.go GetExecutors()

All of the tests in this issue would be focused on the Operator.RunExecutor(int) method instead of the exec() method. Then it would be simple to create appropriate mocks for GetNextBuild and CreateExecutor, to effectively just test the executor+runtime.

go-vela / community