frequenz-floss / frequenz-api-dispatch

gRPC+protobuf specification and Python bindings for the Frequenz Dispatch API
https://frequenz-floss.github.io/frequenz-api-dispatch/
MIT License
1 stars 6 forks source link

Add Dispatch State #167

Open thomas-nicolai-frequenz opened 5 months ago

thomas-nicolai-frequenz commented 5 months ago

What's needed?

We need to be able to report the dispatch state.

Proposed solution

+// Enum DispatchState represents different stages of a dispatch's lifecycle, including its retrieval 
+// and execution states.
+enum DispatchState {
+  // UNKNOWN: Default status, unknown or not specified.
+  DISPATCH_STATE_UNKNOWN = 0;
+  
+  // PENDING_RETRIEVAL: Dispatch is created and waiting to be retrieved by the controller.
+  // This state is also set when an existing dispatch is updated, requiring re-retrieval to 
+  // incorporate changes.
+  DISPATCH_STATE_PENDING_RETRIEVAL = 1; 
+  
+  // RETRIEVED: Dispatch has been retrieved by the controller.
+  DISPATCH_STATE_RETRIEVED = 2;
+  
+  // STARTED: Dispatch has started execution.
+  DISPATCH_STATE_STARTED = 3;
+  
+  // COMPLETED: Dispatch has been executed and completed.
+  DISPATCH_STATE_COMPLETED = 4;
+
+  // WAITING: Dispatch is waiting to start its next scheduled execution.
+  DISPATCH_STATE_WAITING = 5;
+}

// Represents a microgrid dispatch with full details, including its ID, state and associated
// UTC timestamps.
message DispatchDetail {
  // Unique identifier of the microgrid dispatch.
  uint64 dispatch_id = 1;

  // The details of the dispatch.
  Dispatch dispatch = 2;

+  // Details of the dispatch current state.
+  DispatchState state = 3;

  // UTC Timestamp when the order was created.
-  google.protobuf.Timestamp create_time = 3;
+  google.protobuf.Timestamp create_time = 4;

  // UTC Timestamp of the last update to the order.
-  google.protobuf.Timestamp modification_time = 4;
+  google.protobuf.Timestamp modification_time = 5;
}

Use cases

No response

Alternatives and workarounds

No response

Additional context

No response

llucax commented 5 months ago

Playing devil's advocate here, shouldn't we have a more dynamic way to check if a dispatch is running? Otherwise it could lead to stale/misleading information. For example if a dispatch starts to execute, so it is updated to set the state to DISPATCH_STATE_EXECUTING, then the controller loses connection for a while longer than when the dispatch finishes to execute, then it will be still shown as DISPATCH_STATE_EXECUTING when it should probably be DISPATCH_STATE_UNKNOWN. Also if the actor executing the dispatch crashes, it could potentially get into DISPATCH_STATE_EXECUTING forever, or until the actor is restarted, in which case it is also not really executing.

About retrieved, what about updates? We create a dispatch, it is retrieved and marked as DISPATCH_STATE_RETRIEVED. If the dispatch is modified, should the state be set back to DISPATCH_STATE_PENDING_RETRIEVAL? Again, what if the dispatch actor crashes? All dispatches need to be re-retrieved on restart. Is it OK that they are kept marked as DISPATCH_STATE_RETRIEVED when they are not really retrieved by the (restarted actor in the) controller yet?

thomas-nicolai-frequenz commented 5 months ago

For example if a dispatch starts to execute, so it is updated to set the state to DISPATCH_STATE_EXECUTING, then the controller loses connection for a while longer than when the dispatch finishes to execute

You're right and you've been too fast :-) I just adjusted the above to accomodate for this.

thomas-nicolai-frequenz commented 5 months ago

About retrieved, what about updates? We create a dispatch, it is retrieved and marked as DISPATCH_STATE_RETRIEVED. If the dispatch is modified, should the state be set back to DISPATCH_STATE_PENDING_RETRIEVAL

Thats correct. I'll adjust the documentation.

Again, what if the dispatch actor crashes? All dispatches need to be re-retrieved on restart. Is it OK that they are kept marked as DISPATCH_STATE_RETRIEVED when they are not really retrieved by the (restarted actor in the) controller yet?

A crash does not necessarily imply that the dispatch is lost. The dispatch should be locally cached anyhow. Re-retrieval just does not change the state imho. However, I think this is less a discussion of the interface.

llucax commented 5 months ago

You're right and you've been too fast :-) I just adjusted the above to accomodate for this.

OK, yeah, STARTED seems more accurate. I still think the fact that this status is no dynamically retrieved and might be outdated/wrong is worth mentioning. But also not sure if the specs is the best place.

Also, we will potentially run into race conditions:

  1. A dispatch is created (PENDING)
  2. The actor polling starts and the dispatch is retrieved but the status is not yet updated (PENDING)
  3. Someone updates the dispatch (PENDING)
  4. The actor finishes processing the new dispatch and sets it to RETRIEVED

Until the next polling, the dispatch is marked as RETRIEVED when it is actually PENDING.

This also makes me think that we'll need a specific API call to change the status that doesn't update the update_time, otherwise the actor, when marking a dispatch as RETRIEVE (or STARTED, etc.) it will trigger an unnecessary update that it will have to consume itself on the next poll.

llucax commented 5 months ago

A crash does not necessarily imply that the dispatch is lost. The dispatch should be locally cached anyhow. Re-retrieval just does not change the state imho. However, I think this is less a discussion of the interface.

OK, so as I see it this is more like a log of the last action than a real status. So the actor will say "I just did this to this dispatch". In this case, the dispatch is marked as RETRIEVED because it was retrieved at least once, not because we know that is known by the dispatch actor at the precise instant the status is consulted, right?

I also wonder what if there are multiple dispatch actors retrieving dispatches, this status will only mean that at least one actor did some action to this dispatch, right?

However, I think this is less a discussion of the interface.

True, but I think it is important to validate that what we want to do with this interface is really useful and doable in the context of our architecture.

thomas-nicolai-frequenz commented 5 months ago

The actor polling starts and the dispatch is retrieved but the status is not yet updated (PENDING)

What do you mean by "not yet"? The API does change the status right away to RETRIEVED once the controller requests the dispatch from the API. Keep in mind these status is changed by the API and not the client.

thomas-nicolai-frequenz commented 5 months ago

I also wonder what if there are multiple dispatch actors retrieving dispatches, this status will only mean that at least one actor did some action to this dispatch, right?

hold on its about the controller having received it not actors.

llucax commented 5 months ago

Oh, OK, I thought the client needed to post status updates. So pending to retrieved will be handled by the server then, but the started, waiting and completed should be set explicitly by the client, right?

llucax commented 5 months ago

BTW, by the "the actor" I mean the "dispatch actor", the one querying the server, caching the dispatches and scheduling them to instruct other actors when a dispatch should be executed. So in this context "the actor" (the dispatch actor) should be the same as "the controller".

thomas-nicolai-frequenz commented 5 months ago

Oh, OK, I thought the client needed to post status updates.

Its both. DISPATCH_STATE_STARTED, DISPATCH_STATE_COMPLETED are submitted by the client.