Why: Sub of #949 . In order to create a realistic manifest file for the GSF website
What: We need the ability to carry out simple arithmetic between the different elements
Context
Let's say we have a component containing a time series for number of page views per hour, which may have been populated using an importer plugin for e.g. google analytics.
We also have a separate component, e.g. web-server that has impacts for energy and carbon in the same time intervals as the page visits.
Now we want to calculate our SCI score by dividing carbon in each observation in the web-server component by the page views in the page-views component - we can't because all the information we need to process an observation and create a new output value has to exist within the same component as that observation.
This is problematic because it suggests we have to either know the page views in advance and manually add them everywhere we need them across our manifest, or we have to run some importer plugin for every component in the tree that wants to access that data, leading to a lot of repetition, points of failure and unnecessary carbon expenditure.
What this amounts to is that today, unless we want to make manual interventions to the manifest, we cannot use time series data for our functional unit in SCI calculations.
Here's what we want to be able to do:
we have a component, component A, in a tree whose input data is a time series populated using an importer plugin. This component tracks page visits for a website
we have a bunch more components, B -> Z that also have input data and a pipeline of plugins that eventually yield carbon per timestep
we run time sync so everything is snapped to a common grid
we then want to calculate SCI for all the components B -> Z by dividing carbon in each timestep in their time series element-wise by site-visits in components A's time series data.
we aggregate the sci values, skipping component A because it doesn't have carbon values
We might have to assert that --observe plugins across the whole tree are executed before any --compute plugins are executed, otherwise we have ordering requirements for certain compute plugins (e.g. we could try to execute a sci that relies on some functional unit in another component where those values haven't been imported yet).
note Why not just use the importer inside each component and add the page-visits to each observation?
A few reasons - first is that it's a wasteful way to get the data, it would require an external API call per component for data we already have, which is time, energy and carbon inefficient. Also, it's plausible the response could change from one component to another. It also requires that the data arriving from the importer is already sync'd with the existing set of timestamps, which it may or may not be - this would be tricky to handle internally. These are the reasons i think separate components plus cross-component operations are the way to go.
*Narek's implementation notes
To let the framework know that we will want to reuse the observed value in other child components, we have to pass store-result: true flag to the plugin config in initialize section like this:
azure-importer:
store-result: true
...
In the pipeline user can mention name of the plugin and the components name to reuse it’s data:
Note from @jmcook1186: I prefer something like global: true compared to store-result: true. Then we can invoke using global: page-views rather than using the original component name.
Meanwhile the framework will check, if the name in the compute section is present in the plugins storage, then it will execute from scratch, otherwise framework will check results storage to see if there is any data saved by previous child component.
Scope of work:
[ ] IF behaviour updated to enable cross-component operations
[ ] documentation updated
[ ] test cases added
Acceptance Criteria
Scenario 1
GIVEN the cross-component operations are working
WHEN I run the following manifest:
Why: Sub of #949 . In order to create a realistic manifest file for the GSF website What: We need the ability to carry out simple arithmetic between the different elements Context
Let's say we have a component containing a time series for number of page views per hour, which may have been populated using an importer plugin for e.g. google analytics.
We also have a separate component, e.g.
web-server
that has impacts for energy and carbon in the same time intervals as the page visits.Now we want to calculate our SCI score by dividing
carbon
in each observation in theweb-server
component by the page views in thepage-views
component - we can't because all the information we need to process an observation and create a new output value has to exist within the same component as that observation.This is problematic because it suggests we have to either know the page views in advance and manually add them everywhere we need them across our manifest, or we have to run some importer plugin for every component in the tree that wants to access that data, leading to a lot of repetition, points of failure and unnecessary carbon expenditure.
What this amounts to is that today, unless we want to make manual interventions to the manifest, we cannot use time series data for our functional unit in SCI calculations.
Here's what we want to be able to do:
We might have to assert that
--observe
plugins across the whole tree are executed before any--compute
plugins are executed, otherwise we have ordering requirements for certain compute plugins (e.g. we could try to execute a sci that relies on some functional unit in another component where those values haven't been imported yet).note Why not just use the importer inside each component and add the page-visits to each observation? A few reasons - first is that it's a wasteful way to get the data, it would require an external API call per component for data we already have, which is time, energy and carbon inefficient. Also, it's plausible the response could change from one component to another. It also requires that the data arriving from the importer is already sync'd with the existing set of timestamps, which it may or may not be - this would be tricky to handle internally. These are the reasons i think separate components plus cross-component operations are the way to go.
*Narek's implementation notes
To let the framework know that we will want to reuse the observed value in other child components, we have to pass store-result: true flag to the plugin config in initialize section like this:
In the pipeline user can mention name of the plugin and the components name to reuse it’s data:
Note from @jmcook1186: I prefer something like
global: true
compared tostore-result: true
. Then we can invoke usingglobal: page-views
rather than using the original component name.Meanwhile the framework will check, if the name in the compute section is present in the plugins storage, then it will execute from scratch, otherwise framework will check results storage to see if there is any data saved by previous child component.
Scope of work:
Acceptance Criteria
Scenario 1
GIVEN the cross-component operations are working WHEN I run the following manifest:
I get the following output: