cloud-native-compositions / compositions

Apache License 2.0
4 stars 3 forks source link

Add Observability Features for Cloud-Native Compositions #17

Open mlunadia opened 1 week ago

mlunadia commented 1 week ago

Why

As the design of Cloud-Native Compositions evolves, it's important to incorporate observability as a core feature from the outset. Observability is crucial for ensuring that users can understand, debug, and monitor the behaviour of their compositions effectively.

Without built-in observability, users will struggle to diagnose issues such as failed deployments, misconfigured resources, or unexpected runtime behaviour. By thinking about observability ahead of time, CNC can provide a seamless and transparent user experience.

What

Proposed Features (where applicable)

  1. Status and Events:

    • Design the composition's Custom Resource Definition (CRD) to include a status field that displays:
    • The current state (e.g., Pending, Running, Failed, Completed).
    • Detailed error messages for failed states.
    • Progress tracking for multi-stage compositions.
    • Plan to emit Kubernetes events for composition lifecycle changes to integrate with existing Kubernetes tooling.
  2. Logging:

    • Include structured logging for each stage of the composition's execution.
    • Ensure logs are centralized and easily accessible for expanders, facades, and any dynamic behavior.
    • Make logs accessible through standard Kubernetes tools like kubectl.
  3. Tracing:

    • Build support for OpenTelemetry to trace the lifecycle of a composition and its associated resources.
    • Include spans for:
    • Composition parsing and validation.
    • Resource creation, updates, and reconciliation.
    • Plan for integration with popular observability platforms (e.g., Prometheus, Grafana, Elastic).
  4. kubectl commands Integration:

    • Envision integration with kubectl commands that explore resource status (get, describe, etc.) that allows users to monitor their compositions in real-time.

Benefits

This issue can be broken down into smaller chunks of deliverables. An iterative approach will ensure CNC is built with observability as a first-class feature, avoiding technical debt and improving user adoption.