cdevents / spec

A common specification for Continuous Delivery events
Apache License 2.0
125 stars 21 forks source link

Use Case: Parent Pipeline Triggering Multiple Child Pipelines #142

Open PradeepGopalgowda opened 1 year ago

PradeepGopalgowda commented 1 year ago

From a CDEvents point of view, we have the flexibility to define the content of events produced by pipeline orchestrators. While the specifics of the parent pipeline and pipeline orchestrator behavior may not be defined in CDEvents, we can consider these aspects when defining the data model of the PipelineRun and TaskRun events, as well as other events that may be necessary.

Overview: The goal of this implementation is to leverage a parent-child pipeline architecture for effective management of complex workflows. The parent pipeline acts as a control mechanism, responsible for triggering and coordinating the execution of multiple child pipelines, each representing a specific task or sub-process within the larger workflow.

Workflow Complexity: It is crucial to account for scenarios where the overall workflow becomes complex, involving multiple interdependent tasks or parallel execution of independent tasks. The parent pipeline should be designed to handle such complexity efficiently, ensuring proper sequencing, synchronization, and management of the entire workflow.

Dependency Management: Child pipelines often rely on dependencies from other tasks or data sources. The parent pipeline should be capable of resolving these dependencies by triggering the child pipelines in the correct order. It should possess the ability to determine and manage the dependencies between different tasks, guaranteeing that each child pipeline receives the required input.

Parallel Execution: To expedite the overall workflow, the parent pipeline should support parallel execution of multiple child pipelines when applicable. It should be capable of coordinating and tracking the progress of each child pipeline, enabling efficient resource utilization and reducing overall processing time.

Error Handling and Recovery: The parent pipeline needs to monitor the execution of each child pipeline and handle any errors or exceptions that may occur during their execution. It should implement strategies such as retries, error logging, or fallback actions to ensure the successful completion of the entire workflow. In the event of failure, the parent pipeline should initiate appropriate recovery procedures or trigger specific error-handling child pipelines.

Workflow Monitoring and Reporting: To provide visibility into the progress and status of the overall workflow, the parent pipeline should collect metrics, logs, and notifications from child pipelines. It should aggregate this information into a comprehensive view of the workflow execution, enabling monitoring, troubleshooting, performance analysis, and report generation.

Scalability and Modularity: The architecture should be designed to decompose a complex workflow into smaller, manageable child pipelines. Each child pipeline should be independently developed, tested, and maintained, allowing for easier modifications or enhancements to specific components without impacting the entire workflow. The solution should be scalable and modular, facilitating the efficient execution of large-scale data processing tasks or multi-step workflows.

By addressing the points mentioned above, we aim to implement a robust parent-child pipeline architecture using CDEvents. This approach improves the management of complex workflows, handles dependencies, enables parallel processing, facilitates error recovery, provides monitoring capabilities, and ensures scalability and modularity.

afrittoli commented 1 year ago

Thanks @PradeepGopalgowda for this use case. I think that the work we are doing in https://github.com/cdevents/spec/issues/104 may help with your use case, but it may not cover all points you highlighted.

afrittoli commented 1 year ago

From a CDEvents point of view, we have the flexibility to define the content of events produced by pipeline orchestrators. While the specifics of the parent pipeline and pipeline orchestrator behavior may not be defined in CDEvents, we can consider these aspects when defining the data model of the PipelineRun and TaskRun events, as well as other events that may be necessary.

Sounds good. Ideally, CDEvents could enable users to run event-driven pipelines across tools.

Overview: The goal of this implementation is to leverage a parent-child pipeline architecture for effective management of complex workflows. The parent pipeline acts as a control mechanism, responsible for triggering and coordinating the execution of multiple child pipelines, each representing a specific task or sub-process within the larger workflow.

CDEvents are declarative as opposed to imperative: this means that the source pipeline can let the world know that certain tasks and pipeline events occurred, and provide all the context required for children pipelines to react to it; it is however generally up to the consumer to decide whether and how to react to a certain event.

A pipeline may send events about a build task being completed and an artifact being packaged. CVE scanning tools may react to those events by running their checks. The power of this approach is that more tools may react to the same event without having to change the source pipeline. The potential downside is that the source pipeline may not easily "wait" for such tools to complete since it is not necessarily known in advance which checks will be executed.

In an event-driven scenario, I would translate consider the following architecture:

[I will paste a diagram here soon]

Workflow Complexity: It is crucial to account for scenarios where the overall workflow becomes complex, involving multiple interdependent tasks or parallel execution of independent tasks. The parent pipeline should be designed to handle such complexity efficiently, ensuring proper sequencing, synchronization, and management of the entire workflow.

Dependency Management: Child pipelines often rely on dependencies from other tasks or data sources. The parent pipeline should be capable of resolving these dependencies by triggering the child pipelines in the correct order. It should possess the ability to determine and manage the dependencies between different tasks, guaranteeing that each child pipeline receives the required input.

Parallel Execution: To expedite the overall workflow, the parent pipeline should support parallel execution of multiple child pipelines when applicable. It should be capable of coordinating and tracking the progress of each child pipeline, enabling efficient resource utilization and reducing overall processing time.

Error Handling and Recovery: The parent pipeline needs to monitor the execution of each child pipeline and handle any errors or exceptions that may occur during their execution. It should implement strategies such as retries, error logging, or fallback actions to ensure the successful completion of the entire workflow. In the event of failure, the parent pipeline should initiate appropriate recovery procedures or trigger specific error-handling child pipelines.

Workflow Monitoring and Reporting: To provide visibility into the progress and status of the overall workflow, the parent pipeline should collect metrics, logs, and notifications from child pipelines. It should aggregate this information into a comprehensive view of the workflow execution, enabling monitoring, troubleshooting, performance analysis, and report generation.

Scalability and Modularity: The architecture should be designed to decompose a complex workflow into smaller, manageable child pipelines. Each child pipeline should be independently developed, tested, and maintained, allowing for easier modifications or enhancements to specific components without impacting the entire workflow. The solution should be scalable and modular, facilitating the efficient execution of large-scale data processing tasks or multi-step workflows.

By addressing the points mentioned above, we aim to implement a robust parent-child pipeline architecture using CDEvents. This approach improves the management of complex workflows, handles dependencies, enables parallel processing, facilitates error recovery, provides monitoring capabilities, and ensures scalability and modularity.

afrittoli commented 1 year ago

Issue #39 describes a used case that CDEvents is interested in addressing about triggering CI/CD workflow on downstream components from upstream dependency events, which seems relevant to the topic of "Dependency Management" in this issue.

e-backmark-ericsson commented 1 year ago

@PradeepGopalgowda, is the following use case relevant for this request? A test step in a pipeline is replaced with a call to an external test system which may have its own pipeline and events. Events sent from that external test system's pipeline should be able to refer to the caller's events somehow

PradeepGopalgowda commented 1 year ago

@e-backmark-ericsson Yes, the use case you described is relevant to the request. It involves integrating an external test system into a pipeline and enabling event references between the caller's events and the events generated by the external test system's pipeline.