Green-Software-Foundation / if

Impact Framework
https://if.greensoftware.foundation/
MIT License
141 stars 40 forks source link

Add `--append` feature to IF #845

Closed jmcook1186 closed 2 weeks ago

jmcook1186 commented 3 months ago

What Sub of #764 Add an --append mode to IF that takes a manifest with outputs and, instead of overwriting the outputs, adds new timesteps to them.

Why Enables IF to be run continuously or as batch jobs and still yield a single output manifest.

Context

We want people to be able to have intermittent IF runs that append output data to a file rather than each independent run overwriting the outputs section.

The way this would work is if that an importer in the observe pipeline is configured to grab data using a relative time definition such as latest or daily, meaning the timestamps are not hardcoded into the manifest, but are inferred from the time of execution. In this case, the same manifest would return data for a different time range each time it is executed, and each new set of data would overwrite what was there before. What we would like to do instead, is to add the --append tag to the CLI to configure IF to add new inputs and outputs to the manifest instead of overwriting them.

--append can have some firm boundaries at this stage to make the feature simpler to build, for example:

Prerequisites/resources none

SoW (scope of work)

Acceptance criteria


AND now I open this file and update the timestamps int he mock observation config so they are more recent, without removing any of the `inputs` or `outputs`

```yaml
name: mock-cpu-util-to-carbon
description: >-
  a complete pipeline that starts with mocked CPU utilization data and outputs
  operational carbon in gCO2eq
initialize:
  plugins:
    mock-observations:
      path: builtin
      method: MockObservations
      global-config:
        timestamp-from: '2024-03-05T00:00:04.000Z'
        timestamp-to: '2024-03-05T00:00:07.000Z'
        duration: 1
        components:
          - name: server-1
            cloud/instance-type: Standard_E64_v3
            cloud/region: westus3
        generators:
          common:
            cloud/vendor: azure
          randint:
            cpu/energy:
              min: 1
              max: 99
            mem/energy:
              min: 1
              max: 99
    sum:
      path: builtin
      method: Sum
      global-config:
        input-parameters:
          - cpu/energy
          - mem/energy
        output-parameter: energy
execution:
  command: >-
    /home/user/.npm/_npx/1bf7c3c15bf47d04/node_modules/.bin/ts-node
    /home/user/Code/if/src/index.ts -m
    manifests/examples/mock-cpu-util-to-carbon.yml -s
  environment:
    if-version: 0.4.0
    os: linux
    os-version: 5.15.0-107-generic
    node-version: 21.4.0
    date-time: 2024-06-18T14:18:44.864Z (UTC)
    dependencies:
      - '@babel/core@7.22.10'
      - '@babel/preset-typescript@7.23.3'
      - '@commitlint/cli@18.6.0'
      - '@commitlint/config-conventional@18.6.0'
      - '@grnsft/if-core@0.0.3'
      - '@jest/globals@29.7.0'
      - '@types/jest@29.5.8'
      - '@types/js-yaml@4.0.9'
      - '@types/luxon@3.4.2'
      - '@types/node@20.9.0'
      - axios-mock-adapter@1.22.0
      - axios@1.7.2
      - cross-env@7.0.3
      - csv-parse@5.5.6
      - csv-stringify@6.4.6
      - fixpack@4.0.0
      - gts@5.2.0
      - husky@8.0.3
      - jest@29.7.0
      - js-yaml@4.1.0
      - lint-staged@15.2.2
      - luxon@3.4.4
      - release-it@16.3.0
      - rimraf@5.0.5
      - ts-command-line-args@2.5.1
      - ts-jest@29.1.1
      - typescript-cubic-spline@1.0.1
      - typescript@5.2.2
      - winston@3.11.0
      - zod@3.22.4
  status: success
tree:
  pipeline:
    - mock-observations
    - sum
  defaults: null
  config:
    group-by:
      group:
        - cloud/region
        - name
  inputs:
    - timestamp: '2024-03-05T00:00:00.000Z'
      duration: 1
      name: server-1
      cloud/instance-type: Standard_E64_v3
      cloud/region: westus3
      cloud/vendor: azure
      cpu/energy: 5
      mem/energy: 10
    - timestamp: '2024-03-05T00:00:01.000Z'
      duration: 1
      name: server-1
      cloud/instance-type: Standard_E64_v3
      cloud/region: westus3
      cloud/vendor: azure
      cpu/energy: 71
      mem/energy: 5
    - timestamp: '2024-03-05T00:00:02.000Z'
      duration: 1
      name: server-1
      cloud/instance-type: Standard_E64_v3
      cloud/region: westus3
      cloud/vendor: azure
      cpu/energy: 36
      mem/energy: 74
  outputs:
    - timestamp: '2024-03-05T00:00:00.000Z'
      duration: 1
      name: server-1
      cloud/instance-type: Standard_E64_v3
      cloud/region: westus3
      cloud/vendor: azure
      cpu/energy: 5
      mem/energy: 10
      energy: 15
    - timestamp: '2024-03-05T00:00:01.000Z'
      duration: 1
      name: server-1
      cloud/instance-type: Standard_E64_v3
      cloud/region: westus3
      cloud/vendor: azure
      cpu/energy: 71
      mem/energy: 5
      energy: 76
    - timestamp: '2024-03-05T00:00:02.000Z'
      duration: 1
      name: server-1
      cloud/instance-type: Standard_E64_v3
      cloud/region: westus3
      cloud/vendor: azure
      cpu/energy: 36
      mem/energy: 74
      energy: 110

WHEN I run the manifest with if-run -m manifest.yml --append

THEN if I open manifest.yml it contains the following:

name: mock-cpu-util-to-carbon
description: >-
  a complete pipeline that starts with mocked CPU utilization data and outputs
  operational carbon in gCO2eq
initialize:
  plugins:
    mock-observations:
      path: builtin
      method: MockObservations
      global-config:
        timestamp-from: '2024-03-05T00:00:04.000Z'
        timestamp-to: '2024-03-05T00:00:07.000Z'
        duration: 1
        components:
          - name: server-1
            cloud/instance-type: Standard_E64_v3
            cloud/region: westus3
        generators:
          common:
            cloud/vendor: azure
          randint:
            cpu/energy:
              min: 1
              max: 99
            mem/energy:
              min: 1
              max: 99
    sum:
      path: builtin
      method: Sum
      global-config:
        input-parameters:
          - cpu/energy
          - mem/energy
        output-parameter: energy
execution:
  command: >-
    /home/user/.npm/_npx/1bf7c3c15bf47d04/node_modules/.bin/ts-node
    /home/user/Code/if/src/index.ts -m
    manifests/examples/mock-cpu-util-to-carbon.yml -s
  environment:
    if-version: 0.4.0
    os: linux
    os-version: 5.15.0-107-generic
    node-version: 21.4.0
    date-time: 2024-06-18T14:18:44.864Z (UTC)
    dependencies:
      - '@babel/core@7.22.10'
      - '@babel/preset-typescript@7.23.3'
      - '@commitlint/cli@18.6.0'
      - '@commitlint/config-conventional@18.6.0'
      - '@grnsft/if-core@0.0.3'
      - '@jest/globals@29.7.0'
      - '@types/jest@29.5.8'
      - '@types/js-yaml@4.0.9'
      - '@types/luxon@3.4.2'
      - '@types/node@20.9.0'
      - axios-mock-adapter@1.22.0
      - axios@1.7.2
      - cross-env@7.0.3
      - csv-parse@5.5.6
      - csv-stringify@6.4.6
      - fixpack@4.0.0
      - gts@5.2.0
      - husky@8.0.3
      - jest@29.7.0
      - js-yaml@4.1.0
      - lint-staged@15.2.2
      - luxon@3.4.4
      - release-it@16.3.0
      - rimraf@5.0.5
      - ts-command-line-args@2.5.1
      - ts-jest@29.1.1
      - typescript-cubic-spline@1.0.1
      - typescript@5.2.2
      - winston@3.11.0
      - zod@3.22.4
  status: success
tree:
  pipeline:
    - mock-observations
    - sum
  defaults: null
  config:
    group-by:
      group:
        - cloud/region
        - name
  inputs: 
    - timestamp: '2024-03-05T00:00:00.000Z'
      duration: 1
      name: server-1
      cloud/instance-type: Standard_E64_v3
      cloud/region: westus3
      cloud/vendor: azure
      cpu/energy: 5
      mem/energy: 10
    - timestamp: '2024-03-05T00:00:01.000Z'
      duration: 1
      name: server-1
      cloud/instance-type: Standard_E64_v3
      cloud/region: westus3
      cloud/vendor: azure
      cpu/energy: 71
      mem/energy: 5
    - timestamp: '2024-03-05T00:00:02.000Z'
      duration: 1
      name: server-1
      cloud/instance-type: Standard_E64_v3
      cloud/region: westus3
      cloud/vendor: azure
      cpu/energy: 36
      mem/energy: 74
    - timestamp: '2024-03-05T00:00:03.000Z'
      duration: 1
      name: server-1
      cloud/instance-type: Standard_E64_v3
      cloud/region: westus3
      cloud/vendor: azure
      cpu/energy: 36
      mem/energy: 74
    - timestamp: '2024-03-05T00:00:04.000Z'
      duration: 1
      name: server-1
      cloud/instance-type: Standard_E64_v3
      cloud/region: westus3
      cloud/vendor: azure
      cpu/energy: 36
      mem/energy: 74
    - timestamp: '2024-03-05T00:00:05.000Z'
      duration: 1
      name: server-1
      cloud/instance-type: Standard_E64_v3
      cloud/region: westus3
      cloud/vendor: azure
      cpu/energy: 36
      mem/energy: 74
    - timestamp: '2024-03-05T00:00:06.000Z'
      duration: 1
      name: server-1
      cloud/instance-type: Standard_E64_v3
      cloud/region: westus3
      cloud/vendor: azure
      cpu/energy: 36
      mem/energy: 74
  outputs:
    - timestamp: '2024-03-05T00:00:00.000Z'
      duration: 1
      name: server-1
      cloud/instance-type: Standard_E64_v3
      cloud/region: westus3
      cloud/vendor: azure
      cpu/energy: 5
      mem/energy: 10
      energy: 15
    - timestamp: '2024-03-05T00:00:01.000Z'
      duration: 1
      name: server-1
      cloud/instance-type: Standard_E64_v3
      cloud/region: westus3
      cloud/vendor: azure
      cpu/energy: 71
      mem/energy: 5
      energy: 76
    - timestamp: '2024-03-05T00:00:02.000Z'
      duration: 1
      name: server-1
      cloud/instance-type: Standard_E64_v3
      cloud/region: westus3
      cloud/vendor: azure
      cpu/energy: 36
      mem/energy: 74
      energy: 110
    - timestamp: '2024-03-05T00:00:03.000Z'
      duration: 1
      name: server-1
      cloud/instance-type: Standard_E64_v3
      cloud/region: westus3
      cloud/vendor: azure
      cpu/energy: 36
      mem/energy: 74
      energy: 110
    - timestamp: '2024-03-05T00:00:04.000Z'
      duration: 1
      name: server-1
      cloud/instance-type: Standard_E64_v3
      cloud/region: westus3
      cloud/vendor: azure
      cpu/energy: 36
      mem/energy: 74
      energy: 110
    - timestamp: '2024-03-05T00:00:05.000Z'
      duration: 1
      name: server-1
      cloud/instance-type: Standard_E64_v3
      cloud/region: westus3
      cloud/vendor: azure
      cpu/energy: 36
      mem/energy: 74
      energy: 110
    - timestamp: '2024-03-05T00:00:06.000Z'
      duration: 1
      name: server-1
      cloud/instance-type: Standard_E64_v3
      cloud/region: westus3
      cloud/vendor: azure
      cpu/energy: 36
      mem/energy: 74
      energy: 110
zanete commented 3 months ago

@jawache please review the AC

jamescrowley commented 2 months ago

@jawache @zanete I’d be happy to take this one if useful, let me know

zanete commented 2 months ago

@jamescrowley that's great to hear, let me tag @jmcook1186 so he is aware and can comment if there's anything standing in the way πŸ™

jmcook1186 commented 2 months ago

Hi @jamescrowley - yes, please go for it - thanks!

zanete commented 2 months ago

Hi @jamescrowley, I hope you’re doing great! I just wanted to check in and see how you're doing with this feature. Please feel free to share any updates or questions you have for us to discuss! πŸ™

jamescrowley commented 1 month ago

@zanete I'm starting with some integration tests set up to capture the requirement defined above. However, several of the current integration tests do not pass locally. For example

Executing `aggregate.yaml`
βœ– Files do not match!
tree.children.application.children.uk-west.children.server-1.aggregated.cpu/utilization
source: 148
target: 74
Executing `mock-obs-time-sync.yaml`
βœ– Files do not match!
tree.children.child-1.outputs.0.cloud/instance-type
source: NaN
target: A1
Executing `success.yaml`
βœ– Files do not match!
tree.children.child.outputs.0.duration
source: exists
target: missing
Executing `failure-not-matching-with-regex.yaml`
βœ– Files do not match!
execution.status
source: success
target: fail
Executing `success.yml.yaml`
βœ– [2024-07-25 02:35:13.797 PM] error:   ENOENT: no such file or directory, open 'if/manifests/outputs/plugins/sci/re-success.yml.yaml'
Error:  ENOENT: no such file or directory, open 'if/manifests/outputs/plugins/sci/re-success.yml.yaml'

---------
Check summary:
52 of 61 files are passed.

Are these expected? I noticed in the CI set up they don't run on every PR but only for a release?

zanete commented 1 month ago

@jamescrowley thanks so much for the update! Let me tag @narekhovhannisyan who can hopefully shed some light on your question. πŸ™

jamescrowley commented 1 month ago

@zanete @jmcook1186 Two questions:

  1. In the case of an aggregation and group bys, would you expect these to be applied over the combined (pre-existing and new) outputs, or only on the new outputs? I'm assumingthe former, but let me know your thoughts.

  2. The examples given in the issue has

timestamp-from: '2024-03-05T00:00:04.000Z'

for the 're run'. I assume that was a typo, and timestamp-from: should be '2024-03-05T00:00:03.000Z' in order to get the output described in the example, but let me know if I've missed a nuance somewhere.

jamescrowley commented 1 month ago

I've pushed a rough draft here: https://github.com/Green-Software-Foundation/if/pull/932 for discussion to ensure it's along the lines of what you had in mind?

zanete commented 1 month ago

Thanks so much @jamescrowley , let's get @jmcook1186 and @narekhovhannisyan to take a look, please πŸ™