Green-Software-Foundation / if

Impact Framework
https://if.greensoftware.foundation/
MIT License
159 stars 41 forks source link

patch time sync plugin to handle stack too deep errors #993

Open zanete opened 3 months ago

zanete commented 3 months ago

Why: Sub of #949 . The framework is implemented to deal with very granular time units (seconds), but in real life the units for measurements come in larger time intervals that the IF is unable to handle, causing overflow errors. What: Find an MVP solution for the time being to enable working with larger time intervals. Will return to this problem at a later date to implement a more robust solution

Context

We often have patchy data, we don't have granular data, or we are only interested in values at e.g. monthly time resolution.

We don't work well in these situations, even though they are very common - we're really just set up for situations where we can load in granular time series.

We often see people (and this is what I ended up doing in v1 of the GSF site manifest) have a single timestep with a duration of 1 month or one year in seconds and then execute a single, large pipeline that eventually yields SCI. This is because it's very fiddly and repetitive to compartmentalise into individual components and often we can't / don't really need to access granular temporal data anyway.

If I just want an overall SCI value for my entire application, I'm not going to spend a week trying to source granular data, or even worse naively chunking up the single value that fits my needs into time block just because that's what a manifest favours.

As a user in this situation it's much easier to interpret my manifest when I've just constructed a pipeline, can follow through the logic and read off my single value as compared to interpreting aggregated data, which is presented in quite a non-aesthetic way.

A separate time issue is the current behaviour of time sync. It works as a plugin, which means we have to invoke it in every pipeline in the tree. But we encountered some problematic side effects associated with making it a “global” feature. On balance we decided to stick with the status quo for now, but we need to revisit the way we handle time, perhaps starting from a blank page.

The current time-sync plugin often errors out with “RangeError: Maximum call stack size exceeded” when we’re using time units greater than minutes. For example, time-sync applied to monthly data, even when we just want to create time series at a resolution of 10 or even 100 hours, throws with this error. Maybe this is because of our “base” resolution being fixed at 1s, which creates too much data when we’re trying to operate over monthly data. Either way, this needs fixing or we’re only applicable to observations over seconds -> minute ranges, not week -> month -> year ranges.

Scope of work:

zanete commented 3 months ago

There is also a community PR open that could fix this

zanete commented 2 months ago

@jmcook1186 any chance you could point us to the manifest that was failing?

jmcook1186 commented 2 months ago

@zanete @narekhovhannisyan

The manifest below yields the following error:

<--- JS stacktrace --->

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
----- Native stack trace -----

 1: 0xcb8196 node::OOMErrorHandler(char const*, v8::OOMDetails const&) [node]
 2: 0x1033090 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [node]
 3: 0x1033377 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [node]
 4: 0x12525c5  [node]
 5: 0x1252a9e  [node]
 6: 0x1267cc6 v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::internal::GarbageCollectionReason, char const*) [node]
 7: 0x12687e9  [node]
 8: 0x1268df8  [node]
 9: 0x19b8811  [node]
Aborted (core dumped)

However - i really had to push the time sync parameters to an extreme set of values to cause this!

name: nesting
description: a manifest that includes nested child components
tags:
  kind: web
  complexity: moderate
  category: on-premise
aggregation:
  metrics:
    - carbon
  type: "both"
initialize:
  plugins:
    "interpolate":
      method: Interpolation
      path: "builtin"
      config:
        method: linear
        x: [0, 10, 50, 100]
        y: [0.12, 0.32, 0.75, 1.02]
        input-parameter: "cpu/utilization"
        output-parameter: "cpu-factor"
      parameter-metadata:
        inputs:
          cpu/utilization:
            unit: percentage
            description: refers to CPU utilization.
            aggregation-method:
              time: avg
              component: sum
        outputs:
          cpu-factor:
            unit: kWh
            description: result of interpolate
            aggregation-method:
              time: avg
              component: avg
    "cpu-factor-to-wattage":
      method: Multiply
      path: builtin
      config:
        input-parameters: ["cpu-factor", "cpu/thermal-design-power"]
        output-parameter: "cpu-wattage"
      parameter-metadata:
        inputs:
          cpu-factor:
            unit: kWh
            description: result of interpolate
            aggregation-method:
              time: avg
              component: avg
          cpu/thermal-design-power:
            unit: kWh
            description: thermal design power for a processor
            aggregation-method:
              time: avg
              component: avg
        outputs:
          cpu-wattage:
            unit: kWh
            description: the energy used by the CPU
            aggregation-method:
              time: sum
              component: sum
    "wattage-times-duration":
      method: Multiply
      path: builtin
      config:
        input-parameters: ["cpu-wattage", "duration"]
        output-parameter: "cpu-wattage-times-duration"
    "wattage-to-energy-kwh":
      method: Divide
      path: "builtin"
      config:
        numerator: cpu-wattage-times-duration
        denominator: 3600000
        output: cpu-energy-raw
      parameter-metadata:
        inputs:
          cpu-wattage-times-duration:
            unit: kWh
            description: CPU wattage multiplied by duration
            aggregation-method:
              time: sum
              component: sum
        outputs:
          cpu-energy-raw:
            unit: kWh
            description: Raw energy used by CPU in kWh
            aggregation-method:
              time: sum
              component: sum
    "calculate-vcpu-ratio":
      method: Divide
      path: "builtin"
      config:
        numerator: vcpus-total
        denominator: vcpus-allocated
        output: vcpu-ratio
      parameter-metadata:
        outputs:
          vcpu-ratio:
            unit: none
            description: Ratio of vCPUs
            aggregation-method:
              time: copy
              component: copy
    "correct-cpu-energy-for-vcpu-ratio":
      method: Divide
      path: "builtin"
      config:
        numerator: cpu-energy-raw
        denominator: vcpu-ratio
        output: cpu-energy-kwh
    sci-embodied:
      path: "builtin"
      method: SciEmbodied
    "operational-carbon":
      method: Multiply
      path: builtin
      config:
        input-parameters: ["cpu-energy-kwh", "grid/carbon-intensity"]
        output-parameter: "carbon-operational"
      parameter-metadata:
        inputs:
          cpu-energy-kwh:
            unit: kWh
            description: Corrected CPU energy in kWh
            aggregation-method:
              time: sum
              component: sum
          grid/carbon-intensity:
            unit: gCO2eq/kWh
            description: Carbon intensity for the grid
            aggregation-method:
              time: avg
              component: avg
        outputs:
          carbon-operational:
            unit: gCO2eq
            description: Operational carbon footprint
            aggregation-method:
              time: sum
              component: sum
    sci:
      path: "builtin"
      method: Sci
      config:
        functional-unit: "requests"
      parameter-metadata:
        inputs:
          requests:
            unit: none
            description: expressed the final SCI value
            aggregation-method:
              time: sum
              component: sum
    "sum-carbon":
      path: "builtin"
      method: Sum
      config:
        input-parameters:
          - carbon-operational
          - embodied-carbon
        output-parameter: carbon
      parameter-metadata:
        inputs:
          carbon-operational:
            description: Operational carbon footprint
            unit: gCO2eq
            aggregation-method:
              time: sum
              component: sum
          embodied-carbon:
            description: Embodied carbon footprint
            unit: gCO2eq
            aggregation-method:
              time: sum
              component: sum
        outputs:
          carbon:
            description: Total carbon footprint
            unit: gCO2eq
            aggregation-method:
              time: sum
              component: sum
    time-sync:
      method: TimeSync
      path: "builtin"
      config:
        start-time: '2023-01-01T00:00:00.000Z'
        end-time: '2024-01-01T00:00:00.000Z'
        interval: 2
        allow-padding: true
      parameter-metadata:
        inputs:
          timestamp:
            unit: RFC3339
            description: refers to the time of occurrence of the input
            aggregation-method:
              time: none
              component: none
          duration:
            unit: seconds
            description: refers to the duration of the input
            aggregation-method:
              time: sum
              component: sum
          cloud/instance-type:
            unit: none
            description: type of Cloud Instance name used in the cloud provider APIs
            aggregation-method:
              time: copy
              component: copy
          cloud/region:
            unit: none
            description: region cloud instance
            aggregation-method:
              time: copy
              component: copy
          time-reserved:
            unit: seconds
            description: time reserved for a component
            aggregation-method:
              time: avg
              component: avg
          network/energy:
            description: "Energy consumed by the Network of the component"
            unit: "kWh"
            aggregation-method:
              time: sum
              component: sum

tree:
  children:
    child-0:
      defaults:
        cpu/thermal-design-power: 100
        grid/carbon-intensity: 800
        device/emissions-embodied: 1533.120 # gCO2eq
        time-reserved: 3600 # 1hr in seconds
        device/expected-lifespan: 94608000 # 3 years in seconds
        vcpus-allocated: 1
        vcpus-total: 8
      pipeline:
        compute:
          - interpolate
          - cpu-factor-to-wattage
          - wattage-times-duration
          - wattage-to-energy-kwh
          - calculate-vcpu-ratio
          - correct-cpu-energy-for-vcpu-ratio
          - sci-embodied
          - operational-carbon
          - sum-carbon
          - time-sync
          - sci
      inputs:
        - timestamp: "2023-01-01T00:00:00.000Z"
          cloud/instance-type: A1
          cloud/region: uk-west
          duration: 2629800
          cpu/utilization: 50
          network/energy: 0.000001
          requests: 50
        - timestamp: "2023-02-01T00:00:00.000Z"
          duration: 2629800
          cpu/utilization: 20
          cloud/instance-type: A1
          cloud/region: uk-west
          network/energy: 0.000001
          requests: 60
        - timestamp: "2023-03-01T00:00:00.000Z"
          duration: 2629800
          cpu/utilization: 15
          cloud/instance-type: A1
          cloud/region: uk-west
          network/energy: 0.000001
          requests: 70
        - timestamp: "2023-04-01T00:00:00.000Z"
          duration: 2629800
          cloud/instance-type: A1
          cloud/region: uk-west
          cpu/utilization: 15
          network/energy: 0.000001
          requests: 55
        - timestamp: "2023-05-01T00:00:00.000Z"
          duration: 2629800
          cloud/instance-type: A1
          cloud/region: uk-west
          cpu/utilization: 15
          network/energy: 0.000001
          requests: 55
        - timestamp: "2023-06-01T00:00:00.000Z"
          duration: 2629800
          cloud/instance-type: A1
          cloud/region: uk-west
          cpu/utilization: 15
          network/energy: 0.000001
          requests: 55
        - timestamp: "2023-07-01T00:00:00.000Z"
          duration: 2629800
          cloud/instance-type: A1
          cloud/region: uk-west
          cpu/utilization: 15
          network/energy: 0.000001
          requests: 55
    child-1:
      defaults:
        cpu/thermal-design-power: 100
        grid/carbon-intensity: 800
        device/emissions-embodied: 1533.120 # gCO2eq
        time-reserved: 3600 # 1hr in seconds
        device/expected-lifespan: 94608000 # 3 years in seconds
        vcpus-allocated: 1
        vcpus-total: 8
      pipeline:
        compute:
          - interpolate
          - cpu-factor-to-wattage
          - wattage-times-duration
          - wattage-to-energy-kwh
          - calculate-vcpu-ratio
          - correct-cpu-energy-for-vcpu-ratio
          - sci-embodied
          - operational-carbon
          - sum-carbon
          - time-sync
          - sci
      inputs:
        - timestamp: "2023-01-01T00:00:00.000Z"
          cloud/instance-type: A1
          cloud/region: uk-west
          duration: 2629800
          cpu/utilization: 50
          network/energy: 0.000001
          requests: 50
        - timestamp: "2023-02-01T00:00:00.000Z"
          duration: 2629800
          cpu/utilization: 20
          cloud/instance-type: A1
          cloud/region: uk-west
          network/energy: 0.000001
          requests: 60
        - timestamp: "2023-03-01T00:00:00.000Z"
          duration: 2629800
          cpu/utilization: 15
          cloud/instance-type: A1
          cloud/region: uk-west
          network/energy: 0.000001
          requests: 70
        - timestamp: "2023-04-01T00:00:00.000Z"
          duration: 2629800
          cloud/instance-type: A1
          cloud/region: uk-west
          cpu/utilization: 15
          network/energy: 0.000001
          requests: 55
        - timestamp: "2023-05-01T00:00:00.000Z"
          duration: 2629800
          cloud/instance-type: A1
          cloud/region: uk-west
          cpu/utilization: 15
          network/energy: 0.000001
          requests: 55
        - timestamp: "2023-06-01T00:00:00.000Z"
          duration: 2629800
          cloud/instance-type: A1
          cloud/region: uk-west
          cpu/utilization: 15
          network/energy: 0.000001
          requests: 55
        - timestamp: "2023-07-01T00:00:00.000Z"
          duration: 2629800
          cloud/instance-type: A1
          cloud/region: uk-west
          cpu/utilization: 15
          network/energy: 0.000001
          requests: 55
    child-2:
      children:
        child-2-0:
          defaults:
            cpu/thermal-design-power: 100
            grid/carbon-intensity: 800
            device/emissions-embodied: 1533.120 # gCO2eq
            time-reserved: 3600 # 1hr in seconds
            device/expected-lifespan: 94608000 # 3 years in seconds
            vcpus-allocated: 1
            vcpus-total: 8
          pipeline:
            compute:
              - interpolate
              - cpu-factor-to-wattage
              - wattage-times-duration
              - wattage-to-energy-kwh
              - calculate-vcpu-ratio
              - correct-cpu-energy-for-vcpu-ratio
              - sci-embodied
              - operational-carbon
              - sum-carbon
              - time-sync
              - sci
          inputs:
            - timestamp: "2023-01-01T00:00:00.000Z"
              cloud/instance-type: A1
              cloud/region: uk-west
              duration: 2629800
              cpu/utilization: 50
              network/energy: 0.000001
              requests: 50
            - timestamp: "2023-02-01T00:00:00.000Z"
              duration: 2629800
              cpu/utilization: 20
              cloud/instance-type: A1
              cloud/region: uk-west
              network/energy: 0.000001
              requests: 60
            - timestamp: "2023-03-01T00:00:00.000Z"
              duration: 2629800
              cpu/utilization: 15
              cloud/instance-type: A1
              cloud/region: uk-west
              network/energy: 0.000001
              requests: 70
            - timestamp: "2023-04-01T00:00:00.000Z"
              duration: 2629800
              cloud/instance-type: A1
              cloud/region: uk-west
              cpu/utilization: 15
              network/energy: 0.000001
              requests: 55
            - timestamp: "2023-05-01T00:00:00.000Z"
              duration: 2629800
              cloud/instance-type: A1
              cloud/region: uk-west
              cpu/utilization: 15
              network/energy: 0.000001
              requests: 55
            - timestamp: "2023-06-01T00:00:00.000Z"
              duration: 2629800
              cloud/instance-type: A1
              cloud/region: uk-west
              cpu/utilization: 15
              network/energy: 0.000001
              requests: 55
            - timestamp: "2023-07-01T00:00:00.000Z"
              duration: 2629800
              cloud/instance-type: A1
              cloud/region: uk-west
              cpu/utilization: 15
              network/energy: 0.000001
              requests: 55
        child-2-1:
          defaults:
            cpu/thermal-design-power: 100
            grid/carbon-intensity: 800
            device/emissions-embodied: 1533.120 # gCO2eq
            time-reserved: 3600 # 1hr in seconds
            device/expected-lifespan: 94608000 # 3 years in seconds
            vcpus-allocated: 1
            vcpus-total: 8
          pipeline:
            compute:
              - interpolate
              - cpu-factor-to-wattage
              - wattage-times-duration
              - wattage-to-energy-kwh
              - calculate-vcpu-ratio
              - correct-cpu-energy-for-vcpu-ratio
              - sci-embodied
              - operational-carbon
              - sum-carbon
              - time-sync
              - sci
          inputs:
            - timestamp: "2023-01-01T00:00:00.000Z"
              cloud/instance-type: A1
              cloud/region: uk-west
              duration: 2629800
              cpu/utilization: 50
              network/energy: 0.000001
              requests: 50
            - timestamp: "2023-02-01T00:00:00.000Z"
              duration: 2629800
              cpu/utilization: 20
              cloud/instance-type: A1
              cloud/region: uk-west
              network/energy: 0.000001
              requests: 60
            - timestamp: "2023-03-01T00:00:00.000Z"
              duration: 2629800
              cpu/utilization: 15
              cloud/instance-type: A1
              cloud/region: uk-west
              network/energy: 0.000001
              requests: 70
            - timestamp: "2023-04-01T00:00:00.000Z"
              duration: 2629800
              cloud/instance-type: A1
              cloud/region: uk-west
              cpu/utilization: 15
              network/energy: 0.000001
              requests: 55
            - timestamp: "2023-05-01T00:00:00.000Z"
              duration: 2629800
              cloud/instance-type: A1
              cloud/region: uk-west
              cpu/utilization: 15
              network/energy: 0.000001
              requests: 55
            - timestamp: "2023-06-01T00:00:00.000Z"
              duration: 2629800
              cloud/instance-type: A1
              cloud/region: uk-west
              cpu/utilization: 15
              network/energy: 0.000001
              requests: 55
            - timestamp: "2023-07-01T00:00:00.000Z"
              duration: 2629800
              cloud/instance-type: A1
              cloud/region: uk-west
              cpu/utilization: 15
              network/energy: 0.000001
              requests: 55
zanete commented 1 month ago

@jmcook1186 please attach the new manifest that throws this error

zanete commented 1 month ago

@narekhovhannisyan your suggestion - to break down the file into logical chunks

zanete commented 1 month ago

Put it aside given #1057

zanete commented 4 weeks ago

as #1057 is not going to be implemented, this again becomes an issue to fix

zanete commented 2 weeks ago

Will get back to this after discovering what was the blocker with the group by issue

zanete commented 1 week ago

Status update: expecting a PR by early next week