Netflix / genie

Distributed Big Data Orchestration Service
https://netflix.github.io/genie
Apache License 2.0
1.71k stars 367 forks source link

Fix exposing inconsistency in job status outside of persistence API #1152

Closed tgianos closed 2 years ago

tgianos commented 2 years ago

Within job launch logic there was a helper method which would query the job status and based on the returned value proceed with some logic to either update it or fall back to other logic. This works ok if all requests to the persistence service implementation go to a single cosistent backend. If, however, read only queries go to a read replica which may have lag or some other implementation entirely this breaks down without the service actually knowing why or how.

Moving the logic for this behind the persistence API and letting the launch service only act the returned job status from the source of truth api should fix this problem.

coveralls commented 2 years ago

Coverage Status

Coverage decreased (-0.01%) to 93.771% when pulling 3d165383d6a511e935ea1f20e3a21da47aa37acd on tgianos:fixLaunchStatusRace into e55800983b3c5f587fc7796578d90002c7378662 on Netflix:4.1.x.