Energinet-DataHub / ARCHIVED-geh-aggregations

This project aims to create an engine that is able to do calculations on billions of metering points and deliver the results within minutes
Apache License 2.0
2 stars 0 forks source link

[being closed, to be replaced by #746] Time series from meter data responsible are included in calculations jobs #471

Closed MadsBloendAndersen closed 2 years ago

MadsBloendAndersen commented 3 years ago

Problem Description

Timeseries are core data for the aggregation domain.

In this feature, a solution of how to publish timeseries to aggregation domain must be designed and implemented, so whenever time-series enter the time-series domain, these will be used in a calculation job.

The following flow must be supported, when this feature is finished image

Benefit Hypothesis

Timeseries are the value to the aggregation and wholesale jobs. Without them we could not perform settlement.

Acceptance criteria and must have scope

Out of scope

Tech note:

  1. We need to be able to get notified if a streaming job is running or is stopped. (surveailance) health / ops
    • What to do if streaming job fails and stops?
      • Retry logic is applicable for streaming jobs - LKI 25-01-2022
    • What happens to the received time series that is added to event hub?
      • The events on the event hub should still be available on event hub until consumed by streaming job with a retention of up to seven days if standard tier is selected for event hub, reference link. - LKI 25-01-2022
      • Another option to allow for almost indefinite event retention is to use Azure Event Hubs Capture (link1, link2), which essentially consumes events from Event Hub and stores them in a storage account or data lake in avro format. - LKI 25-01-2022
      • If opting for Azure Event Hubs Capture, an option is to use Databricks Auto Loader to stream events from data lake into delta lake on new files detected. This can be done using Avro as file format. Read more on how to configure Auto Loader. - LKI 25-01-2022
    • How can we restart the streaming job and receive the queued up time series on the event hub?
      • It is possible to specify a checkpoint location for a streaming job, which holds information on which events have been processed. - LKI 25-01-2022
  2. Performance tests to document how much data we can handle. (needs metrics defined)

Non Functional Requirements

Stakeholders

Khatozen Irene Volt

Note:

mogensjuul commented 3 years ago

@msp Tjek

MadsBloendAndersen commented 3 years ago

Hey team! Please add your planning poker estimate with ZenHub @ZavezX @BjarkeMeier @HenrikSommer

MadsBloendAndersen commented 3 years ago

@BjarkeMeier @ZavezX @HenrikSommer estimate in "man sprint" :)

MadsBloendAndersen commented 2 years ago

@PerTHenriksen link til miro flow chart:

https://miro.com/welcomeonboard/dDV2WnFZSzBPbDJ2TmNOVjJjTFJDdkxzdUJmY1B2aTZmbWJwUkFDZFZoN2U2WWxadnZzaDJxb1JzemZaM2R0Y3wzMDc0NDU3MzUzMzI4MjY3MTIy?invite_link_id=187988064481