Green-Software-Foundation / if

Impact Framework
https://if.greensoftware.foundation/
MIT License
137 stars 38 forks source link

Epic - Global Time Sync #763

Open zanete opened 1 month ago

zanete commented 1 month ago

Background

Observations might be gathered from multiple sources with multiple different durations, start and stop times.

In order to perform aggregation up a tree to the grouping nodes, all the observations needed to be for the same time-buckets and durations so that aggregation can happen across synchronous slices of time.

To support that we created a builtin plugin called TimeSync. TimeSync is a builtin which snaps observations onto a global grid, e.g. every 1hr during a day.

The reason it was built as a plugin rather than a framework feature was that we were not sure where in the process of computation we could always enforce time syncing. Some plugins imported data from other places and generated observations, so the time sync would need to be after those plugins, some plugins depended on the time such as the WattTime plugin so the time syncing should be before them. This means the user has to know in advance where in the pipeline to execute TimeSync. The correct position is not always completely obvious, and mistakes in positioning can lead to IF failures. This adds some fragility and an overall less than optimal developer experience.

Problem statement

Figuring out exactly where you need to insert the time-sync plugin in your pipeline is a bit of a friction point in IF development. It is not always obvious where TimeSync should be positioned in the pipeline, but it has to be correct to ensure the observations are synced in advance of any aggregation or execution of plugins that rely on regular, corrected timing.

To perform aggregation which is a builtin feature and configured at the top of the manifest file, you might need to ensure each of your pipelines has a time-sync plugin at the right step in the pipeline. There is a very strong dependency between a framework feature “aggregation” and a pipeline plugin, which can be awkward to reason about.

Proposed solution

First, we need to have shipped the tasks in the idempotence epic. This breaks IF execution into three distinct phases: observe, group and compute. In this case, TimeSync has a clear, fixed position int he execution flow. It should happen immediately after group and immediately before compute.

Once IF has phased execution, this will always be the right moment to synchronize time. This is because group should always yield individual time series with unique, non-repeated timestamps that can be handled by TimeSync and compute will always operate over synchronized time series.

This means TimeSync can be an IF feature rather than a plugin. We still need some config from the manifest, which can be provided at the top level in the manifest's context. If this config is present, TimeSync should be executed automatically between the group and compute stage of execution.

Related discussion https://github.com/Green-Software-Foundation/if/discussions/771

Tasks

Note all the tasks in idempotence epic are prerequisite for the following tasks:

zanete commented 1 week ago

During feature sizing, the estimated t-shirt size by the team for this feature was L