bananaml / potassium

An HTTP serving framework by Banana
Apache License 2.0
97 stars 9 forks source link

Feature/track gpu status #31

Closed Peddle closed 1 year ago

Peddle commented 1 year ago

What is this?

Adds two fields to /__status__ endpoint: gpu_available: bool and sequence_number: int

gpu_available is true if the gpu lock is not locked. sequence_number increments by one every time a task acquires the GPU.

This PR also refactors how we're dealing with locks in potassium eliminating a few race conditions.

Why?

Using both of these, a load balancer can effectively determine the current state of a potassium deployment.

How did you test it works without regressions?

tests are passing + manual toy app is working as well (more thorough testing should be done with cover in prod before merging staging into main to ensure integration is safe)

If this is a new feature what may a critical error look like?

Things to consider to not repeat mistakes we've learned from many times

erik-dunteman commented 1 year ago

Overall:

  1. status endpoint looks good, no qualms there
  2. lock acquisition all good except the areas where we want to handle the background task done signal
  3. Not sure we need to change from the previous queue design. It was intended to be certain to have an event per background task release, not affect the rest of the app (doesn't lock the global lock), and indicated as private to users.
erik-dunteman commented 1 year ago

did a call offline - approved contingent on changes and test clearance