(Better) feedback on actuation / set of actuators

eclipse / kuksa.val

kuksa.val

Apache License 2.0

92 stars 51 forks source link

(Better) feedback on actuation / set of actuators #560

Open SebastianSchildt opened 1 year ago

SebastianSchildt commented 1 year ago

This should serve as a discussion/starting point of designing better support for actuators in the API

For most use cases involving actuation, it might be good if optionally you can get some feedback.

Current situation: You can write target values. You don't know, whether that works. You DO get feedback in terms of "Unknown datapoint" or "No Access" but a successful set only indicates, databroker set and accepted it sucessfully in its in-memory database. Whether the intended change is happening in the vehicle is unknown on the API level.

You maybe observe current_value in your app, and decide with some custom logic whether the actuate was successful or not.

The functionality missing is an easy to use way of knowing whether something failed (and / or finished). As this is dependent on the southbound, only the provider can be the source of this truth. With he current design, where providers and consumers are completely asynchronous and after a consumer did a set, there is no "contract" if/when and when a provider might pick it up (currently it does so via subscribing target_value), and thus there is also no way to give any kind of useful error information back to consumers.

The problem is also hard to solve "generic" because things like timeouts are very use case dependant

We should agree on API extension/modifications to the kuksa.val.v1 interface. If at all possible this should be a backwards compatible extension

In terms of return codes/feedback, we might need something like

Ok/sucessful indicating the action works. E.g. when you open the trunk, instead of just returing "ok" when the "wish" was saved in memory, it should return "ok"; when the actionw as successfuk
We might need an error condition that indicates, no provider is available to serve the request (i.e. If the actuator is a valid datapoint, but no provider is serving it)

My initial feeling is, that the set API is ok, it might just have some new errors defined.

For providers we likely need a slightly different concept, one solutions might be, they are opening a GRPC bidi stream, "claiming" to provide a certain datapoint and then getting request form databroker

Rough sketch

Consumer	Databroker	Actuation provider
		Open GRPC stream provide(Vehicle.SomeTrunkActuator)
rpc actuate(Vehicle.SomeTrunkActuator, true)r
	ProviderAvailableAndConnected?
	actuate(Vehicle.SomeTrunkActuator, true)
		DO_MAGIC
		WriteResult in Stream
	Note: With parallel requests, databroker need to hanble somehow and match this. Also, do wen need some timeouts? Not "Use case specific", but on a technical level to prevent something getting stuck?
	Return Result or Error
Be confident actuatio worked/or not

On that not, it is super important that we do not "overcomplicate" things for writers of providers (and consumers). It is a matter of design "how diligent" a provider is checking results (depending on how it is connected to the vehicle system, it's own methods may be limited, or in more "classic" use cases/control loops where you send a more low level actuator 30 times a second, you would not care on a individual level anyway. So it should be easy (in terms of lines of code/complexity) for an actuation provider to "Just return OK"

We also might want to think about what our recommendation is for actuations that take longer (e.g. multiple seconds). Will we always block the set? Or just return OK when actuation is in progress? Or is this up to the provider implementer to decide? Do we then have recommendation towards provider "providers" how long these "should" take to answer?

erikbosch commented 1 year ago

I can see that we possibly need to support two type of calls, possibly for the same signal:

Fire and forget - give a result as soon as Databroker (or server) has accepted the request. As today.
Wait for result - give result when operation is "finished" (whatever that means).

The reason is that some clients may want to follow up on (long-term) result, others not. So that is the first decision, do we need to support "both types"?

Then I assume that we want the databroker to be relatively stupid. I assume the databroker must know if there is a "smart provider", i.e. if databroker actually can expect an explicit response. So maybe a "smart provider" needs:

An extension to subscription API, so that when subscribing to target value it can indicate that it will act as a "smart provider" for that signal. Alternatively a separate signal to register as a "smart provider" for a signal, possibly also specifying timeout.
An extra value "request-id" that is included as data when the smart provider gets a new value for a target value subscription.
A new API method to report status or result on a request, so that a "smart feeder" can call something like report(1234, INTERRUPTED). The question is if the "smart feeder" shall report result only once, or regularly if it is a lengthy operation

I.e. databroker needs to keep the gRPC call open, until either it get a positive/negative response from the "smart provider", or the default or signal-specific timeout occurs. If no "smart provider" has registered for the signal maybe the databroker shall report something like ACCEPTED to indicate that the set request was accepted, but no result on if it will succeed will be provided.

Here I assume that the Databroker needs to be able to keep multiple requests for the same actuator active in parallel. I.e. it is up to the "smart provider" to decide what shall happen, if it will perform the two operations after each other, or if it will reject the second request or interrupt the first request and give the second priority.

argerus commented 1 year ago

Good points raised above.

I do agree that one main goal should be that the solution should not "overcomplicate" things for neither providers or consumers.

That is actually a main point of introducing this as it would enable consumers to act on failures to set actuators (try again or propagate this error).

I also agree that it could make sense to differentiate between when an actuation is considered complete. Both in terms of what a consumer is interested in knowing, and in terms of what providers are able to communicate (i.e. they may not be able to monitor the progress past initiating the actuation). Arguably, if a provider is not able monitor the progress, the fallback could just be to monitor the actual value by subscribing to it (using databroker) while retaining the responsibility to decide when/if timeouts should occur.

Fire and forget - give a result as soon as Databroker (or server) has accepted the request. As today.

I do not agree that databroker should ever return a positive response if there is no provider available for a particular actuator. That behaviour is currently causing unnecessary complexity in both providers and consumers. If there are no providers available when a client is trying to set an actuator, it should be informed of this in order to decide what to do next (try again or propagate the error for example). And if there is no provider available when a "set actuator" request is issued, providers connecting at a later point should not be triggered by these "pending" requests as there is no way to know if they are still valid.

A minimum requirement for providers would thus be that they inform databroker of their existence in order for databroker to forward actuation requests to them. I believe it would be beneficial if they where also required to acknowledge when they have received such a request and "initiated" the actuation, as this is something that should be possible even for the most limited providers. This would benefit the consumer side of the API in that they can rely on being informed if there are any errors initiating the actuation.

SebastianSchildt commented 1 year ago

I do not agree that databroker should ever return a positive response if there is no provider available for a particular actuator. That behaviour is currently causing unnecessary complexity in both providers and consumers

Makes sense to me: A response by databroker "I hear you, but currently no provider registered for that path" makes sense. However would the value be dropped, our would it be stored as "target" anyway, in case a provider comes around later (maybe it was not yet started, crashed or was otherwise "indisposed" and ,might act on it (via timestamp it could decide to act on it or not, but that would be up to a provider). Basically databroker being maximally dumb/generic", telling the consumer "hey currently no provider upon set", but still telling any late-joined provider (or actually even a "consumer" just reading/subscribing a target state), "jup, last known desired target state for that signal was 'true',I got that last Thursday".

argerus commented 1 year ago

However would the value be dropped, or would it be stored as "target" anyway, in case a provider comes around later

I think dropping the value would absolutely be the best option (given the complexities introduces by opting to "cache" it).

Firstly, that would mean users can reason about "set actuator" as a request / response concept. And this (in my experience) is what most users expect when trying to set an actuator. If the request would be "cached" indefinitely, the actuation could happen at any point in the future, which could easily lead to unintended consequences. Silly example: If I press the "unlock the car" button in my parking lot and it doesn't work, I probably don't want it to open randomly when I'm no longer there. The user sending the initial request is best suited to decide whether it makes sense to retry that request at a later point.

Secondly, storing the target value introduces complexity in the providers if they need to figure out if the request is current or if it's just some historic request (as you are hinting at in your description). The providers probably don't have access to the information needed to know whether it's desirable to try again. It also introduces a form of complexity in consumers, as they don't know if an existing target value means that any actuation is ongoing. If there is already a target value present, should the client set it (to the same value) again in order to trigger providers waiting for an up to date request? I don't really see anything gained by adding this complexity / ability.

One thing that could be valuable though, is to introduce a way for a consumers to subscribe to the status of actuators (i.e. availability of providers), to use if they want to retry when a provider comes online. As I said, the client/consumer is probably best suited to decide if they want to try again, as they were the one triggering the request in the first place (and thus have access to the conditions under which it should be triggered).

lukasmittag commented 1 year ago

Here I assume that the Databroker needs to be able to keep multiple requests for the same actuator active in parallel. I.e. it is up to the "smart provider" to decide what shall happen, if it will perform the two operations after each other, or if it will reject the second request or interrupt the first request and give the second priority.

Adding to this, this could lead to another thing we could add. Here databroker could give some clients a higher priority and tell it to the providers. Maybe complicates thing and maybe not in scope of databroker itslef (for now) but as Erik mentioned priorities this came to my mind. It's more a comment than a suggestion.

One thing that could be valuable though, is to introduce a way for a consumers to subscribe to the status of actuators (i.e. availability of providers), to use if they want to retry when a provider comes online.

I think this would be really good as it could be combined with feedbacking some properties like how fast the expected response time might be. I think @SebastianSchildt linked those issues too but mentioning again :)

rafaeling commented 1 year ago

I came across this issue during today's meeting and would like to contribute some different/similar perspectives as an external observer and a novice in this context.

Firstly, I find also the current response provided by the databroker to be quite confusing. Considering that there will be providers responsible for determining whether a Set operation succeeded or not on the actuator, I agree that it would be beneficial to start by adding more semantics to the Set response.

One possible approach would be to introduce an additional field "ProviderResponse" (containing values such as NewValue, UpdateInProgress, Error) to the existing Set response provided by the databroker (which currently includes values like Ok, Error, No Access, etc.). This would provide the consumers with a meaningful response regarding the ongoing process on the other side, enabling them to decide whether to retry the set operation. I consider it is important to empower the consumer to take deterministic and robust actions depending on the Set response information, by this the uncertainty would be reduced if a value was set correctly I guess.

Furthermore, if the consumer wants to receive constant updates from a signal, that's precisely what the "subscribe" method is intended for. This would eliminate the need for additional overhead to the "set" method such as blocking calls, waiting, timeouts, etc. Additionally, if we need to know the status of a provider, the "subscribe" method could also include the response status of the provider, in case there is no value to be transmitted to the consumer.

From what I understand, the databroker simply interconnects components. Therefore, I agree with having a comprehensive error semantics in place, rather than introducing more methods and complexity to the databroker(like values caching, etc).

rafaeling commented 1 year ago

Here is an initial proposal. I have introduced a new enum called ValueUpdateStatus in the DataEntry message. The purpose of this new field is to provide to the response messages (GetResponse and SubscribeResponse) with a status indicator. By doing so, we gain valuable insights into the state of the system following the execution of a 'Set' operation.

The ValueUpdateStatus enum mainly looks like this:

UPDATE_SUCCESS: Indicates a successful update operation on the actuator.
UPDATE_IN_PROGRESS: Indicates that the update operation is still in progress.
UPDATE_ERROR: Indicates a generic error during the update operation.
UPDATE_TIMEOUT: Indicates a timeout occurred during the update operation.
UPDATE_INVALID_VALUE: Indicates that the new value provided for the update is invalid.
UPDATE_PROVIDER_UNAVAILABLE: Indicates that the update operation is currently unavailable or not permitted due to the absence of a provider's subscription.

One of the benefits of introducing this new field in DataEntry is that the Database structure we have in the broker contains a hashmap that actually holds all the DataEntry objects along with their corresponding new values. https://github.com/eclipse/kuksa.val/blob/ce8b6526492e69a3c4217a5651be00cc5109fbad/kuksa_databroker/databroker/src/broker.rs#L90

This means that, for example, if some update operation fails, we can retrieve and update the update status on the database broker. Consequently, if a client wants to know the status of its operations, the broker will return the last status stored, providing the client with control over its performed operation. API protobuf would like this:

broker dot

I have also been considering creating an additional protocol diagram, similar to the TCP protocol or the ONVIF protocol for cameras. This diagram would help users understand the correct process to update values on actuators and check whether the operations have been successfully performed.

lukasmittag commented 1 year ago

@rafaeling and I had a quick discussion. A quick summary:

Adding an UpdateValueStatus to a DataEntry makes sense
We discussed about UPDATE_INVALID_VALUE and we are not sure if it's needed. For example it could be needed because the actuator has not the range, but I think if Databroker enforces min and max values of the VSS specification it should better be handeld over them.
Databroker needs to differentiate between a provider and non provider -> then databroker needs to handle UPDATE_PROVIDER_UNAIVALBLE -> I think easiest way is: if no provider is available at the request set the UPDATE_PROVIDER_UNAIVALBLE and then forget target_value. Consumer can then try again if he wants.
For UPDATE_SUCCESS the provider needs to report back -> new SetReqeust on UpdateValueStatus
For UPDATE_TIMEOUT the provider needs to report back -> new SetRequest on UpdateValueStatus
For UPDATE_ERROR the provider needs to report back -> new SetRequest on UpdateValueStatus
Consumer need to subscribe to/get UpdateValueStatus maybe could be done over another new VIEW

I think we need to prototype something and then discuss and have a look at it. Therefore we should keep it simple for now. I think UpdateStatusValue addition is a good starting point.

argerus commented 1 year ago

While being able to subscribe to the status of actuators would be a useful feature, I don't think it's really addressing the main issue at hand here.

In order to even have this feature, something along the lines of what @SebastianSchildt showed in the table above would be needed, i.e. at a minimum: having an open GRPC connection (stream) between databroker <--> provider.

Without this, there is no way for databroker to know whether a provider is available, and it would not be possible to correctly maintain a correct state of the actuator status.

Secondly, with the current proposal, I think it would become unnecessary complex for clients to determine whether a "set actuator" request was successful.

(I agree with the observation that there should probably be two different definitions of when a "set actuator" should be considered complete, but lets consider the most simple case first: It is considered complete / successful once the actuator provider has acknowledged the command to actuate, which would typically be done just after the actuation is initialized)

If I understand the proposed usage of value_update_status here, setting an actuator and getting this type of feedback would require a client to:

Subscribe to the actuator status.
Set the actuator target.
Wait for the actuator status to change into either UPDATE_IN_PROGRESS or UPDATE_SUCCESS.
Read the final actuator status (i.e. propagate the last subscribe message to the original call site) in order to determine if it was a success or not.

In pseudo code something like (ignoring synchronization :

set_actuator_target_called = false

# start the subscription in a separate thread
for message in client.subscribe(Window.Front.Left.Position, FIELD_ACTUATOR_STATUS):
    if not set_actuator_target_called:
        continue # Not caused by us yet
    elif message.actuator_status == UPDATE_SUCCESS or message.actuator_status == UPDATE_IN_PROGRESS:
        result = SUCCESS
    else:
        result = FAILURE

# in the original thread, set actuator target
client.set(Window.Front.Left.Position, FIELD_ACTUATOR_TARGET, 20)
set_actuator_target_called = true # possible race condition

# wait (somehow, probably involving threads / condition vars) for result to be populated
print(result)

I could be missing something, but I would not categorize this as being easy to use.

Having the "set actuator" call block until the actuation has been acknowledged by the provider (or databroker responding with an error in case of no provider) seems like a much simpler API.

result = client.actuate(Window.Front.Left.Position, 20)
# ^ blocks until actuator has acknowledged actuation start (or failure)
print(result)

and perhaps (at some point) even have a complementing function for when the client wants to wait until the actuation has fully completed.

result = client.actuate_to_completion(Window.Front.Left.Position, 20)
# ^ blocks until actuator has reached the desired state (or failed)
print(result)

Perhaps these "convenience functions" could be implemented in the python library, client side. But it would still be an unnecessary complex implementation, which would need to be done for every language library wanting this. I think it would be a lot smarter to have this implemented once databroker side, exposed as a ready to use GRPC function.

The expression

When all you have is a hammer, everything looks like a nail.

comes to mind, when looking at solving this using only Set, Get & Subscribe.

I do understand that this was done in order to keep the API "simple". But at some point, trying to fit everything into a "too simple" model will end up making it more complex.

Let me just repeat that having actuator status as a field (i.e. metadata) that can be subscribed to can still be a useful feature that is not incompatible with having dedicated actuate and provide functions in the API.

argerus commented 1 year ago

I just want to make sure we are on the same page (in terms of understanding the current behaviour of the API).

rather than introducing more methods and complexity to the databroker(like values caching, etc)

When I'm talking about "caching", I'm describing the current behaviour of kuksa.val.v1 API as it now exists.

The current way to "set actuator" is to use the generic Set function to set the actuator_target field. When this field is set, it's value will persist indefinitely, or in other words, it is conceptually a "cached" "set actuator" request.

This doesn't match the concept of "set actuator" which is fundamentally a momentary request that would only make sense at the time it is made (in general).

This conceptual mismatch is something I think we should address going forward. Personally, I'm convinced that the best way to address it is by adding a dedicated function for this concept (actuate or set_actuator) that would match the expectations of calling such a function. One such expectation, is that calling actuate would actually return whether or not the actuation was successful(ly initiated) or not. And that it would not cache the request indefinitely (and perhaps randomly trigger at some point in the future).