[Mandalorian/Phoenix/Titans] - ENABLER - produktionsmoden implementering af integration events

krmoos commented 1 year ago

Synopsis

As any DH3 stakeholder I want a production-grade implementation of the event-driven design So that business processes don't get stuck or go haywire And monitor in order to detect problems early And allow developers to quickly identify problems and make the system recover fast

Notes:

The feature might depend on a production-grade implementation of monitoring/logging/surveillance/alarms
A similar feature is required for point-to-point communication

Acceptance Criteria

[ ] It is known and implemented how to handle dead letters
[ ] Suitable logging/monitoring/alarms have been implemented in order to detect problems or anomalies early
[ ] Product teams use a shared DH3 platform (NuGet packages?) to publish and subscribe to integration events
[ ] The platform supports effectively-once delivery
[ ] The platform is resilient to service bus downtime or failures
[ ] The platform meets the requirements (performance, message size, ...) of wholesale calculation result publishing
[ ] The platform and domains support the intentions of ADR-008
[ ] The platform support the NFRs (what are they?)

Tech. Notes

See the product teams initative in Confluence.

Testability

[ ] Can be tested?
[ ] Can be demoed?
[ ] Verified by UX

How to testEnviroment:User:Senario:

rvplauborg commented 10 months ago

Bjarke pointed me to this epic for posting a few thoughts on observability: We should ensure that we set up the tracing, so we can trace across services even when communication between them is done via asynchronous events and not synchronous http calls. This means that events should carry with them activity id and similar tracing attributes. Also, probably pretty important to collect metrics on stuff like messages in queues or throughput to discover if queues are growing and consumers cannot keep up.

MadsDue commented 10 months ago

Agree with @rvplauborg, we have done this preciously by ensuring that we track a correlation id across all domains for same "action".

It makes it easier to identify the error in the logs + see all events leading up to the error for the specific action.

BjarkeMeier commented 10 months ago

@rvplauborg and @MadsDue, I'm not sure how the architects want to carve out these features. But I've just now created another one for tracing and diagnostics logging. https://app.zenhub.com/workspaces/epic-board-6375df2fd6f08e0015e1e0e6/issues/gh/energinet-datahub/green-energy-hub/489

mogensjuul commented 8 months ago

@krmoos @rvplauborg Jeg har ikke nogen ide om hvor status er på denne her. Kan I hjælpe?

rvplauborg commented 8 months ago

Hej @mogensjuul. Jeg er ikke rigtig inde over denne opgave, ud over den ene kommentar jeg skrev omkring observability, så må være dig svar skyldig..

Energinet-DataHub / green-energy-hub