Closed ddebrunner closed 5 years ago
Also an attempt to use the monitoring toolkit to build "smart" monitoring and recovery solutions for Streams rather than force everything into the core product.
Example of output sent to Slack using JobStatusService
, PEFailedservice
and SlackMessageService
microservices combined together with no coding.
Initial version is part of 2.0 release
A microservice that detects if a PE has been stopped for a period of time (e.g. a minute). Publishes a tuple if it detects such a PE. Ideally it would differentiate between PEs that were manually stopped and those that failed and could not restart after 10 attempts.
Would use the job status microservice from #111.
Use cases (in separate microservices):
I have initial code, needs more work to fully develop to detect manually stopped PEs etc.