Epic - Idempotent Manifests

Background and problem statement

A command is idempotent if the result is identical regardless of how many times the command is executed. In the context of IF, we want to make it so that a manifest is idempotent so that re-executing the manifest always generates the same result. We can't always guarantee this today, because many manifests have importer plugins as the first element in their pipelines, and we don't control the servers that serve the APIs called by the importers, so we can't be sure that the same request will always yield the same response and by extension we can't guarantee that the same manifest will always give the same output. We anticipate many (most?) manifests using importers in the future. We also anticipate sharing and re-executing manifests being one of the main use cases for IF in the future, and today these things are somewhat incompatible due to the lack of idempotence.

However, we can fix this issue from IF's side by separating out the IF execution into distinct phases and enabling intermediates to be exported at the end of each phase. The first phase can be the data import, which generates a static file. This static file can then be shared, archived, or passed into the next phase of execution.

Separating out distinct execution phases also helps with features such as time-sync and group-by. In our current monolithic design, we have to invoke these featuires at the right point in the pipeline in order for them to execute correctly. While it is obvious what the right sequence is for simple manifests, this isn't necessarily the case for more complex manifests and it can lead to confusion and over-complicated manifest files.

It is also inefficient to have to re-execute an entire manifest, including the requests to external APIs in the importer plugins, just to change one element later in the execution pipeline. It would be much more efficient if we could capture static files at various points in the execution flow that can then be used to re-execute specific parts of the pipeline.

Solution

Split the pipeline and the compute logic into 3 distinct phases:
- observe: the pipeline to generate, import, gather observations. The outputs from this pipeline should be a 1 d array of observations.
- group: Group the 1 d array of observations into a structure which makes sense for the induction step next (and aggregation, export etc...) similar to the group-by builtin.
- compute: given a set of observations, this is the pipeline to calculate impacts

tree:
  observe:
    - mock-observations
  group:
    - cloud/instance-type
  compute:
    - cloud-metadata
    - watttime
    - teads-curve
    - operational-carbon
  inputs: null

We first traverse the tree and run all the plugins in the observe pipeline, these get added to the inputs (for additive run mode) or replace the inputs (in replace mode). We have the option to capture the state of the manifest after these observe operations have been applied and save it to yaml file.

We then traverse the tree and run grouping logic on the inputs. We have the option to capture the state of the manifest after these observe operations have been applied and save it to yaml file.

We then traverse the tree and run the induce pipeline

For the observe and group phases only the inputs change. compute doesn't change the inputs it only generates outputs.

These phases should all run in sequence when you run the ie command. This is just the normal behaviour we have today. However, you should also be able to run each of the phases independently using --observe, --group and --compute commands.

If you just wanted to gather observations and then not run the rest of the pipelines you might run it with just the --observe flag like so:

tree:
  observe:
    - mock-observations
  group:
    - cloud/instance-type
  induce:
    - cloud-metadata
    - watttime
    - teads-curve
    - operational-carbon
  inputs:
    - timestamp: '2024-02-26 00:00:00'
      duration: 300
      cloud/instance-type: m5n.large
      cloud/vendor: aws
      cpu/utilization: 89
    - timestamp: '2024-02-26 00:05:00'
      duration: 300
      cloud/instance-type: m5n.large
      cloud/vendor: aws
      cpu/utilization: 59

ie --observe -m manifest.yml --output static-manifest.yml

This file can now act as a static manifest file which you can use without needing to run the importers.

Then you might run the above file with just --group and end up with something like so:

tree:
  observe:
    - mock-observations
  group:
    - cloud/instance-type
  induce:
    - cloud-metadata
    - watttime
    - teads-curve
    - operational-carbon
  children:
    m5n.large:
    inputs:
      - timestamp: '2024-02-26 00:00:00'
        duration: 300
        cloud/instance-type: m5n.large
        cloud/vendor: aws
        cpu/utilization: 89
      - timestamp: '2024-02-26 00:05:00'
        duration: 300
        cloud/instance-type: m5n.large
        cloud/vendor: aws
        cpu/utilization: 59

ie --group -m static-manifest.yml -o regrouped-manifest.yml

Again this is also a static manifest file which you can then run without any flags and it will run just the compute pipeline which generates outputs.

ie --compute -m regrouped-manifest.yml -o outputs.yml

Tasks

[ ] #809
[x] #810
[x] #811
[ ] #812
[ ] #813

Green-Software-Foundation / if

Epic - Idempotent Manifests #762

Solution