go-vela / community

Community Information for Vela (Target's official Pipeline Automation Framework)
https://go-vela.github.io/docs/
Apache License 2.0
22 stars 3 forks source link

Add integration tests for Worker #585

Open cognifloyd opened 2 years ago

cognifloyd commented 2 years ago

Description

I keep tripping over myself as I'm developing the Kubernetes runtime in the Worker. I would like to find a way to test a lot more of the Worker in CI. The current unit tests don't give me confidence that everything works together.

So, let's add integration (or system, or whatever name you like) tests:

Value

Easier to compare runtimes or executors. Simplify developing new worker features. Increase confidence in the worker as a whole.

Definition of Done

We have tests running in CI in the worker repo that test each combo of executor+runtime.

Effort (Optional)

Probably a lot, but less than the effort I've expended back-tracking to fix what I keep breaking.

Impacted Personas (Optional)

Vela devs

cognifloyd commented 2 years ago

I think the exec() function is a good target for these integration tests.

func (w *Worker) exec(index int) error {

This is where we retrieve a build from the Queue, and then run the executor engine to do the build.

We will need:

This is the Queue Item: https://github.com/go-vela/types/blob/master/item.go#L12-L18

// Item is the queue representation of an item to publish to the queue.
type Item struct {
    Build    *library.Build  `json:"build"`
    Pipeline *pipeline.Build `json:"pipeline"`
    Repo     *library.Repo   `json:"repo"`
    User     *library.User   `json:"user"`
}

These are the Vela Server APIs used in the Worker's executor and internal packages:

These Vela Server APIs are also used in the worker, but these tests probably won't need to handle them as they are only used in cmd/vela-worker/register.go, not in exec().

cognifloyd commented 2 years ago

OK. The biggest problem with testing exec() is that exec() initializes w.Runtime(), so we don't have a chance to inject mock runtime clients.

cognifloyd commented 2 years ago

Hmm. I do not see a good way to test that the build has completed. With the linux executor, I could use Vela server calls by adding some kind of tracking to the vela server mock. But the local executor does not report progress to the Vela server, so testing that is more difficult.

cognifloyd commented 2 years ago

We may need to finish extracting the things from the worker's main package into a new operator package.

As I understand, each worker has one operator. That operator is responsible for creating, tracking, and orchestrating all of the executors. (creating the executor also involves creating the runtime that is passed to the executor)

Doing this would make some of the testing a little more straight forward as mock's wouldn't have to be made quite so indirectly.

(based on feedback from @jbrockopp)

cognifloyd commented 2 years ago

here is a rough outline of what the Operator interface might look like. @jbrockopp wdyt?

type Operator interface {
    // replaces Worker.operate(), called in Worker.Start()
    Operate(context.Context)         error

    // called in Operate()
    CheckInLoop(context.Context)     error

    // replaces Worker.exec(), called in Operate()
    GetNextBuild()                   *types.Item
    CreateExecutor(int, *types.Item) error
    RunExecutor(int)                 error

    // called via api/middleware
    GetExecutors()                   (map[int]*executor.Engine, error)
    // called via api/middleware and in RunExecutor()
    GetExecutor(int)                 (*executor.Engine, error)
}

// implements Operator
type operator struct {...}

func (o *operator) Operate(ctx context.Context) error {
    executors, gctx := errgroup.WithContext(ctx)

    executors.Go(o.CheckInLoop)

    for i := 0; i < o.Config.Build.Limit; i++ {
        id := i
        executors.Go(func() error {
            select {
            case <-ctx.Done():
                return nil
            default:

                item, err := o.GetNextBuild()
                err := o.CreateExecutor(id, item)
                err := o.RunExecutor(id)
            }
        })
    }

    return executors.Wait()
}

Instead of adding "executors" to the gin context, we would add "operator", and then

All of the tests in this issue would be focused on the Operator.RunExecutor(int) method instead of the exec() method. Then it would be simple to create appropriate mocks for GetNextBuild and CreateExecutor, to effectively just test the executor+runtime.