[being closed, to be replaced by #746] Time series from meter data responsible are included in calculations jobs

MadsBloendAndersen commented 3 years ago

Problem Description

Timeseries are core data for the aggregation domain.

In this feature, a solution of how to publish timeseries to aggregation domain must be designed and implemented, so whenever time-series enter the time-series domain, these will be used in a calculation job.

The following flow must be supported, when this feature is finished

Benefit Hypothesis

Timeseries are the value to the aggregation and wholesale jobs. Without them we could not perform settlement.

Acceptance criteria and must have scope

[ ] Given a meter data responsible submits a CIM-based RSM-012, when datahub completes an aggregation (D03) process, Then RSM-014 is available for the meter data responsible, through message hub.
[ ] Given an aggregation process has completed, When examining the generated RSM-014, Then energy sums per grid area is available in the message.
[ ] The design must support future performance requirements which is 2,5 mil timeseries/hour of 24/96 positions each
[ ] Given metering points has status connected, when triggering a job, then they are included
[ ] Given multiple time series are submitted for the same period for the same metering point, Then the most recent time series is used in aggregation job.
[ ] The described flow is working on B-002.
[x] We should be able to receive bundled timeseries messages.

Out of scope

Business validation on timeseries
Integration events from metering point to timeseries domain
Balance fixing process and locking of periods

Tech note:

[ ] Triggering of process is done through postman or related tool

We need to be able to get notified if a streaming job is running or is stopped. (surveailance) health / ops
- What to do if streaming job fails and stops?
  - Retry logic is applicable for streaming jobs - LKI 25-01-2022
- What happens to the received time series that is added to event hub?
  - The events on the event hub should still be available on event hub until consumed by streaming job with a retention of up to seven days if standard tier is selected for event hub, reference link. - LKI 25-01-2022
  - Another option to allow for almost indefinite event retention is to use Azure Event Hubs Capture (link¹, link²), which essentially consumes events from Event Hub and stores them in a storage account or data lake in avro format. - LKI 25-01-2022
  - If opting for Azure Event Hubs Capture, an option is to use Databricks Auto Loader to stream events from data lake into delta lake on new files detected. This can be done using Avro as file format. Read more on how to configure Auto Loader. - LKI 25-01-2022
- How can we restart the streaming job and receive the queued up time series on the event hub?
  - It is possible to specify a checkpoint location for a streaming job, which holds information on which events have been processed. - LKI 25-01-2022
Performance tests to document how much data we can handle. (needs metrics defined)

Non Functional Requirements

[ ] All inbound and outbound messages are logged
[ ] Developers and business are confident that the described flow can work in actor test.

Stakeholders

Khatozen Irene Volt

Note:

~~Actor register / Authorization~~ (Not in scope)
Performance (The design must support future performance requirements which is 2,5 mil/hour of 24/96 positions each)
expected input/output (initial small data sets)
expected integrations (Aggregation domain -> metering points domain)
expected mock data (Aggregation domain -> market roles + charges) (Timeseries domain -> metering point + market roles)
Domain test
Metering point validations
process triggering
retry logic
disaster recovery
monitoring
schema evolution
Eventual consistency
Time series deployment
Post office integration
Scaling

mogensjuul commented 3 years ago

@msp Tjek

MadsBloendAndersen commented 3 years ago

Hey team! Please add your planning poker estimate with ZenHub @ZavezX @BjarkeMeier @HenrikSommer

MadsBloendAndersen commented 3 years ago

@BjarkeMeier @ZavezX @HenrikSommer estimate in "man sprint" :)

MadsBloendAndersen commented 2 years ago

@PerTHenriksen link til miro flow chart:

https://miro.com/welcomeonboard/dDV2WnFZSzBPbDJ2TmNOVjJjTFJDdkxzdUJmY1B2aTZmbWJwUkFDZFZoN2U2WWxadnZzaDJxb1JzemZaM2R0Y3wzMDc0NDU3MzUzMzI4MjY3MTIy?invite_link_id=187988064481

Energinet-DataHub / ARCHIVED-geh-aggregations