defenseunicorns / maru-runner

The Unicorn Task Runner
Apache License 2.0
10 stars 1 forks source link

Implement Concurrent Action Execution in `executeTask` #85

Closed naveensrinivasan closed 1 week ago

naveensrinivasan commented 8 months ago

Body:

Summary

The current implementation of the executeTask method in runner.go executes each action within a task sequentially. This approach ensures simplicity and straightforward execution flow but may not efficiently utilize system resources, especially when actions are independent and could be executed concurrently. This proposal suggests enhancing the executeTask method to support concurrent execution of actions, potentially improving performance and reducing task completion time.

Motivation

Tasks that contain multiple independent actions can benefit significantly from concurrent execution. In the current sequential execution model, the total execution time is the sum of all actions' execution times. By executing actions concurrently, we can reduce the total execution time to the duration of the longest-running action, assuming sufficient system resources. This improvement is particularly beneficial for tasks with many independent actions or tasks where actions are I/O bound and spend significant time waiting.

This should reduce the wait time to bootstrap an env

Proposed Changes

  1. Concurrency Model: Utilize Go's concurrency model, leveraging goroutines and channels, to execute actions within a task concurrently.
  2. Error Handling: Implement a mechanism to collect and handle errors from concurrently executed actions. This could involve aggregating errors and returning a composite error if multiple actions fail.
  3. Environment Variables Management: Ensure that the setup of default environment variables (defaultEnv) and task-specific environment variables (task.EnvPath) is thread-safe and correctly applied to each action executed concurrently.
  4. Dependency Management: Introduce a way to specify dependencies between actions, if necessary. This could be important for tasks where certain actions must be executed in sequence due to dependencies.

Implementation Details

Risks and Mitigations

Prototype

func (r *Runner) executeTask(task types.Task) error {
    if len(task.Files) > 0 {
        if err := r.placeFiles(task.Files); err != nil {
            return err
        }
    }

    defaultEnv := []string{}
    for name, inputParam := range task.Inputs {
        d := inputParam.Default
        if d == "" {
            continue
        }
        defaultEnv = append(defaultEnv, formatEnvVar(name, d))
    }

    // load the tasks env file into the runner, can override previous task's env files
    if task.EnvPath != "" {
        r.envFilePath = task.EnvPath
    }

    // Use a WaitGroup to wait for all goroutines to finish
    var wg sync.WaitGroup
    // Channel to collect errors from goroutines
    errChan := make(chan error, len(task.Actions))

    for _, action := range task.Actions {
        wg.Add(1)
        go func(action types.Action) {
            defer wg.Done()
            action.Env = mergeEnv(action.Env, defaultEnv)
            if err := r.performAction(action); err != nil {
                errChan <- err
                return
            }
            errChan <- nil
        }(action)
    }

    // Wait for all actions to complete
    wg.Wait()
    close(errChan)

    // Check for errors
    for err := range errChan {
        if err != nil {
            return err // or accumulate errors if you prefer
        }
    }

    return nil
}

not tested or compiled

Conclusion

Adopting concurrent action execution in the executeTaskmethod has the potential to significantly improve performance for tasks with multiple independent actions. By carefully managing the risks associated with concurrency, we can make the task execution process more efficient and responsive to user needs.

zachariahmiller commented 8 months ago

Hey @naveensrinivasan maybe i'm misunderstanding, but i dont think we ever want any tasks to ever be executed concurrently unless there is a mechanism in the task file's syntax explicitly indicating that should be the behavior or multiple tasks are called in a uds run command execution.

That being said, i had created a related [issue](https://github.com/defenseunicorns/uds-cli/issues/187) around when the task runner got added to this project where concurrent task execution would come into play.

The only other scenarios i can think of would again require another keyword to indicate that a task can run out of order of the other tasks (i think this might get convoluted in terms of the yaml syntax and we dont want that) or to allow multiple tasks to be executed by uds run like uds run lint test build and have a defined, documented behavior that running things like this will execute concurrently.

Outside of those scenarios and the use case described in the linked issue i think we'd want to be careful about how we approach that.

I'll definitely defer to @UncleGedd here if he has more specific thoughts, but just wanted to make you aware of that other issue that had been created. Thanks for the thorough issue definition here!

naveensrinivasan commented 8 months ago

Given the concerns raised about concurrent tasks in Kubernetes, it's crucial to understand how Kubernetes' architecture and its principle of eventual consistency effectively manage such scenarios. Kubernetes is designed to handle many operations concurrently, ensuring system stability and reliability. Here's a detailed explanation:

Kubernetes and Eventual Consistency

Kubernetes employs an eventual consistency model through its controllers and control loops. Each controller watches for changes to specific resources, analyzing the current state against the desired state and acting to reconcile any discrepancies. This continuous process allows Kubernetes to adapt to changes and maintain the desired system state over time.

Controller Lifecycle and Concurrent Tasks

The lifecycle of a Kubernetes controller can be summarized in a continuous loop of observing changes, analyzing the current versus desired state, and acting to reconcile. This loop ensures that Kubernetes can handle concurrent tasks without leading to system instability. Here's a simplified diagram illustrating this process:

graph TD;
    A[Start] --> B{Observe Changes};
    B --> C[Analyze Current vs. Desired State];
    C --> D{Is Reconciliation Needed?};
    D -->|Yes| E[Act to Reconcile];
    E --> B;
    D -->|No| F[Wait for Next Change];
    F --> B;

Why Concurrent Tasks Do Not Pose an Issue

  1. Decoupled Operations: Kubernetes controllers operate independently, allowing them to manage their respective resources concurrently without interference.
  2. Continuous Reconciliation: The control loop's design ensures that any inconsistencies introduced by concurrent tasks are detected and corrected in subsequent iterations.
  3. Concurrency Management: Kubernetes employs mechanisms like optimistic concurrency control to handle simultaneous updates, preventing conflicts.
  4. Resilience to Failures: The system's design inherently provides resilience to temporary failures or delays, common in concurrent operations. Controllers will retry operations as needed to achieve the desired state.

Caveats and Considerations

Despite the robustness of Kubernetes' design, certain scenarios require careful management of task sequences and dependencies to avoid issues:

  1. Persistent Volume Claims (PVCs) and Persistent Volumes (PVs): PVCs created before the availability of a suitable PV can remain pending, disrupting dependent workloads.
  2. ConfigMaps and Secrets as Volume Mounts: Pods failing to start if referenced ConfigMaps or Secrets are not pre-existing, necessitating recreation or restart of the Pod.
  3. Service Dependencies: Workloads may encounter errors or fail to function correctly if dependent services are not yet created or are misconfigured.
  4. Ingress and Service Dependencies: Misordered creation of Ingress and Services can lead to non-functional ingress until the Service is correctly configured.
  5. Role-Based Access Control (RBAC) Settings: Workloads requiring specific Kubernetes API permissions might fail if the necessary RBAC settings are not established beforehand.
  6. Custom Resource Definitions (CRDs) and Custom Resources (CRs): CRs applied before their CRDs can fail, common in environments with operators and custom controllers.
  7. Network Policies: Improperly ordered application of Network Policies can block necessary Pod communication.
  8. Init Containers Dependencies: Pods with init containers may be affected if these containers depend on resources that are not yet available.

Mitigation Strategies

For these scenarios, incorporating retry logic, readiness checks, and careful planning of task sequences can mitigate issues by ensuring that dependent resources are fully ready before proceeding with subsequent operations.

Conclusion

While Kubernetes' architecture and eventual consistency model provide a solid foundation for managing concurrent tasks, awareness of and planning for specific dependency scenarios are crucial. By understanding these caveats and employing strategic task management, we can leverage Kubernetes' strengths while mitigating potential issues related to resource dependencies and operation orders.

UncleGedd commented 8 months ago

Thanks for the conversation, concurrency isn't a priority atm but I'm sure it will be valuable in the future. I think for now we can table this issue while the runner code gets migrated to its own repo, and I'll transfer this issue to the new repo

zachariahmiller commented 8 months ago

I'll be honest I'm kind of confused by this discussion there is a lot of talk about kubernetes, but this function under uds run has nothing to do with kubernetes. Apologies if I'm missing something here but this is my reasoning:

It (uds run) is a task runner that essentially executes shell commands for use in CI and locally similar to what you would use make or Taskfile for. Also similar to say job execution in CI systems such as gitlab ci or GitHub workflows.

You might use a task to run a command that does something in kubernetes (like kubectl) but that is the tool this is executing doing that, not the tasks themselves and therefore assumptions based on how kubernetes works should not be built into the way we handle execution in uds run tasks.

naveensrinivasan commented 8 months ago

I'll be honest I'm kind of confused by this discussion there is a lot of talk about kubernetes, but this function under uds run has nothing to do with kubernetes. Apologies if I'm missing something here but this is my reasoning:

It (uds run) is a task runner that essentially executes shell commands for use in CI and locally similar to what you would use make or Taskfile for. Also similar to say job execution in CI systems such as gitlab ci or GitHub workflows.

You might use a task to run a command that does something in kubernetes (like kubectl) but that is the tool this is executing doing that, not the tasks themselves and therefore assumptions based on how kubernetes works should not be built into the way we handle execution in uds run tasks.

My ignorance! Thanks for the clarification! I appreciate it! My k8s post was me being a noob

zachariahmiller commented 8 months ago

I'll be honest I'm kind of confused by this discussion there is a lot of talk about kubernetes, but this function under uds run has nothing to do with kubernetes. Apologies if I'm missing something here but this is my reasoning:

It (uds run) is a task runner that essentially executes shell commands for use in CI and locally similar to what you would use make or Taskfile for. Also similar to say job execution in CI systems such as gitlab ci or GitHub workflows.

You might use a task to run a command that does something in kubernetes (like kubectl) but that is the tool this is executing doing that, not the tasks themselves and therefore assumptions based on how kubernetes works should not be built into the way we handle execution in uds run tasks.

My ignorance! Thanks for the clarification! I appreciate it! My k8s post was me being a noob

No worries. Totally understandable... it's a lot to try to digest all at once!

zachariahmiller commented 8 months ago

@naveensrinivasan just to close out this thread. We definitely would still want to retain the ability in bundle deployments (ie uds deploy) to explicitly control ordering of package deployments, but this issue is targeted at trying to find a way to sanely implement concurrency for certain packages to complement the sequential ordering that exists today.

https://github.com/defenseunicorns/uds-cli/issues/30

If there are no objections though, I think this specific issue can be closed.

UncleGedd commented 8 months ago

We can keep this issue, I'll migrate it over to the new runner repo once that is all set up. FWIW I could see a use case for concurrent tasks when doing things like building multiple, independent Zarf pkgs

naveensrinivasan commented 8 months ago

We can keep this issue, I'll migrate it over to the new runner repo once that is all set up. FWIW I could see a use case for concurrent tasks when doing things like building multiple, independent Zarf pkgs

Sounds good. Thanks!

Racer159 commented 1 week ago

closing in favor of #138 #131 and #164 since these focus on some of the more specific changes