Green-Software-Foundation / if

Impact Framework
https://if.greensoftware.foundation/
MIT License
141 stars 40 forks source link

time series arithmetic between the different elements #994

Open zanete opened 1 month ago

zanete commented 1 month ago

Why: Sub of #949 . In order to create a realistic manifest file for the GSF website What: We need the ability to carry out simple arithmetic between the different elements Context

Let's say we have a component containing a time series for number of page views per hour, which may have been populated using an importer plugin for e.g. google analytics.

We also have a separate component, e.g. web-server that has impacts for energy and carbon in the same time intervals as the page visits.

Now we want to calculate our SCI score by dividing carbon in each observation in the web-server component by the page views in the page-views component - we can't because all the information we need to process an observation and create a new output value has to exist within the same component as that observation.

This is problematic because it suggests we have to either know the page views in advance and manually add them everywhere we need them across our manifest, or we have to run some importer plugin for every component in the tree that wants to access that data, leading to a lot of repetition, points of failure and unnecessary carbon expenditure.

What this amounts to is that today, unless we want to make manual interventions to the manifest, we cannot use time series data for our functional unit in SCI calculations.

Here's what we want to be able to do:

We might have to assert that --observe plugins across the whole tree are executed before any --compute plugins are executed, otherwise we have ordering requirements for certain compute plugins (e.g. we could try to execute a sci that relies on some functional unit in another component where those values haven't been imported yet).

note Why not just use the importer inside each component and add the page-visits to each observation? A few reasons - first is that it's a wasteful way to get the data, it would require an external API call per component for data we already have, which is time, energy and carbon inefficient. Also, it's plausible the response could change from one component to another. It also requires that the data arriving from the importer is already sync'd with the existing set of timestamps, which it may or may not be - this would be tricky to handle internally. These are the reasons i think separate components plus cross-component operations are the way to go.

*Narek's implementation notes

To let the framework know that we will want to reuse the observed value in other child components, we have to pass store-result: true flag to the plugin config in initialize section like this:

azure-importer:
  store-result: true
  ...

In the pipeline user can mention name of the plugin and the components name to reuse it’s data:

pipeline:
  compute:
    - child-1:azure-importer
  regroup:
    - some-field
...

Note from @jmcook1186: I prefer something like global: true compared to store-result: true. Then we can invoke using global: page-views rather than using the original component name.

Meanwhile the framework will check, if the name in the compute section is present in the plugins storage, then it will execute from scratch, otherwise framework will check results storage to see if there is any data saved by previous child component.

Scope of work:

Acceptance Criteria

Scenario 1

GIVEN the cross-component operations are working WHEN I run the following manifest:

name: sci demo
description: successful path
tags:
initialize:
  plugins:
    page-visits:
      kind: plugin
      global: true
      method: AnalyticsImporter
      path: "some-path"
      config:
        functional-unit: requests
        output-parameter: 'page-visits'
    sci:
      kind: plugin
      method: Sci
      path: "builtin"
      config:
        functional-unit: global/page-visits
tree:
  children:
    component-1:
      pipeline:
        compute:
          - analytics-importer
      inputs:
    server:
      pipeline:
        compute:
          - sci
      inputs:
        - timestamp: 2023-07-06T00:00
          duration: 3600
          energy: 5
          carbon-operational: 5
          carbon-embodied: 0.02
          carbon: 5.02

I get the following output:

name: sci
description: successful path
tags:
initialize:
  plugins:
    page-visits:
      kind: plugin
      method: AnalyticsImporter
      path: "some-path"
      config:
        output-parameter: 'page-visits'
    sci:
      kind: plugin
      method: Sci
      path: "builtin"
      config:
        functional-unit: global/page-visits
tree:
  children:
    component-1:
      pipeline:
        compute:
          - analytics-importer
      inputs:
        - timestamp: 2023-07-06T00:00
          duration: 3600
          page-visits: 10      
    server:
      pipeline:
        compute:
          - sci
      inputs:
        - timestamp: 2023-07-06T00:00
          duration: 3600
          energy: 5
          carbon-operational: 5
          carbon-embodied: 0.02
          carbon: 5.02
      outputs:
        - timestamp: 2023-07-06T00:00
          duration: 3600
          energy: 5
          carbon-operational: 5
          carbon-embodied: 0.02
          carbon: 5.02
          sci: 0.502
zanete commented 3 hours ago

@jawache please review this solution