frmscoe / General-Issues

This repo exists to track current work and any issues within the FRMS CoE
2 stars 0 forks source link

Investigate Tazama processor resilience when core services (redis, arango, nats) are interrupted #387

Open Justus-at-Tazama opened 3 months ago

Justus-at-Tazama commented 3 months ago

Hypothesis: When the redis, nats, or Arango services are interrupted during production operation, and then restored, the connections from the Tazama processors do not seamlessly resume.

Acceptance criteria

  1. Prove/Disprove the hypothesis
  2. Document failure conditions
  3. Present/Demonstrate failure conditions at the TSC
  4. Follow TSC process to design and prioritise a solution
rtkay123 commented 3 months ago

@vorsterk we're sorted here, please confirm?

Justus-at-Tazama commented 2 months ago

Testing suspended pending contracted capacity. Also due to recent changes in the system across processors, we should probably test this thoroughly and systematically.