Closed kevinherron closed 11 months ago
I'm not convinced it has to be this complicated any more. Someone asked about this by email recently and I had a much shorter response:
It does not seem possible to accomplish this right now, but it's close. The idea would be to use the TransferSubscription service to transfer the previous Subscriptions to the new Session established by the redundant client. All that's missing right now is some way to pass the UaSubscription objects from the old OpcUaSubscriptionManager object to the new one. Right now there is only immutable access to the subscription objects.
I wonder if it can be as simple as that. If the transfer succeeds, you just pass the objects to the new OpcUaSubscriptionManager
, and basically nothing else needs to be done. The event/value consumer callbacks don't even need to be hooked up again or anything.
OpcUaSubscriptionManager
would just need a method like:
public CompletableFuture<???> transferFrom(OpcUaSubscriptionManager otherManager) {
....
}
Just need to figure out a nice API that indicates which subscription transfers failed vs which succeeded, and possible have another override that lets you transfer only some subscriptions.
Ah, the original response I wrote assumes you want to take over subscriptions from an entirely different process/JVM, and the newer, simpler one is from the same process.
Yeah, I think the user story I'm trying to satisfy is something like "client on machine A gets its power cord unplugged and then client on machine B takes over and no notifications are missed."
In actuality the clients will be part of an akka cluster singleton, if one node goes down the cluster choses the next oldest node and creates a new client on that machine node. I'm trying to then have it resume any existing subscriptions that the old instance was monitoring.
What you suggested did make me start thinking about other approaches, but they would require write access to the server. I was considering the idea of serializing the subscription manager to binary, then creating a new node and writing the serialized value there. Then when the new client came up it could read and deserialize that and create the subscription manager. It would still need to call transfer subscription and also wire up the callbacks, but it would be able to tell if the monitored items are events or values and could avoid the wiring up of both that I originally suggested.
Not sure if requiring write access is a good solution as there maybe implementations where that is not allowed. If there is a simple redundant datastore then that could be leveraged to hold the serialized subscription manager that may work, but, I don't know of one that is so simple it would be transparent to the user.
I doubt you could successfully serialize the OpcUaSubscriptionManager
and reconstitute it in any meaningful way.
I think if you want true redundancy across processes/nodes/machines it's going to require some out of band state synchronization in addition to whatever changes are necessary in Milo. I don't think it's possible to re-create the state or field values of OpcUaSubscription
and OpcUaMonitoredItem
using only diagnostic info available from the server. I also don't think the approach is that great because it relies on the server implementing and having diagnostics enabled.
I'm going to let this bounce around in my head a bit... if you've got any ideas feel free to share.
I initially felt the same way about the relying on the diagnostic info, but looking at "OPC 10000-4 Section 6.6.3" it seems like this is how subscription transfer was envisioned:
6.6.3 Client Redundancy Client Redundancy is supported in OPC UA by the TransferSubscriptions Service and by exposing Client information in the Server diagnostic information. Since Subscription lifetime is not tied to the Session in which it was created, backup Clients may use standard diagnostic information available to monitor the active Client’s Session with the Server. Upon detection of an active Client failure, a backup Client would then instruct the Server to transfer the Subscriptions to its own session. If the Subscription is crafted carefully, with sufficient resources to buffer data during the change-over, data loss from a Client Failover can be prevented.
OPC UA does not provide a standardized mechanism for conveying the SessionId and SubscriptionIds from the active Client to the backup Clients, but as long as the backup Clients know the Client name of the active Client, this information is readily available using the SessionDiagnostics and SubscriptionDiagnostics portions of the ServerDiagnostics data. This information is available for authorized users and for the user active on the Session. TransferSubscriptions requires the same user on all redundant Clients to succeed.
After reading this I went ahead and made sure that the client could browse the session diagnostics summary and filter for clients that match an expected name with a tickcount appended, and then sorted by that name to get the most recent. From there I query the subscription diagnostics for that session.
This approach does have a failure mode I believe. If the data is not queried quickly enough it can be lost. However, it appears that this information stays alive for a pretty reasonable amount of time, and my use case is on the order of seconds at most, which seems well within that time period.
Alternatively, going out of band also has its own failure modes, if the subscription is updated and the client dies before the outside store is updated there will be integrity issues.
Of course, if there is a stronger solution I'm fine with deviatiating from the spec and this approach, but the spec does seem like a strong case to proceed using diagnostics.
Yeah, I think I read that entry in the spec when I wrote the original response. Relying on diagnostics for whatever information you can find is sound if the server has it.
You already mentioned the problems with this, but what I mostly meant by synchronizing state out-of-band being necessary is all the other pieces of info you'd need to create the OpcUaSubscription
and OpcUaMonitoredItem
instances on the new node.
Re: point 7, if you have the ReadValueId
of the item (via out-of-band means, or however this ends up happening), you can determine if the callback should be an EventConsumer
or ValueConsumer
by just looking to see if the attribute is AttributeId.EventNotifier
...
So, I feel like I managed to talk myself in a full circle yesterday and arrived right back at what you originally posted.
You seem to have a clear grasp of how the spec says it should work, the problem is only that the spec doesn't acknowledge the realities of synchronizing whatever additional state is necessary for the APIs that each SDK might have for subscriptions and items. As far as the spec is concerned, with the minimal guidance it provides, yes if you have the session id and subscription id you can technically transfer and then resume receiving data change notifications on another client node, but being able to match those data changes to NodeIds and AttributeIds, among other things, seems out of scope
The sticky points really are 6 and 7, the ReadValueId
and any other pieces of information needed to re-create the subscription and monitored item objects on the new client node.
Even if the Milo client subscription/item API was significantly altered to not require most of this information it still seems like the ReadValueId
associated with each item is pretty necessary/important and I don't think there's any way to figure this out just by querying the server and its diagnostics.
Completely agree, matching to NodeIds and AttributeIds does not seem covered, sadly. It would be nice if GetMonitoredItems
returned more information.
Looking at the OpcUaSubscription
defintion it seems the contents of the subscriptions diagnostics array should satisfy the needs of the OpcUaSubscription
ctor. And when the monitored items are recreated the results of GetMonitoredItems
should allow the population of itemsByClientHandle
& itemsByServerHandle
; notificationListeners
I couldn't quite determine; I believe the notificationSemaphore
is fine; and the last sequence number should picked up from the AvailableSequenceNumbers
in the TransferResult
for that subscription. I think the requested
fields would have to default to the revised
fields.
I think this sort of underlines your problem statement and draws attention to the fact that later modifySubscription
calls could not occur after a transfer, because when the new client tries to modify a subscription, how will it know which transferred subscription to use?
Lets, for example, say there is a product that allows users to configure subscriptions for an application. An initial instance starts up and sees there are no previous subscriptions for that application and subscribes direcly using it's configuration for 3 subscriptions, and later fails and a 2nd client is instantly spawned. At startup the 2nd instance sees that the last instance (having 3 configured subscriptions) failed and that the new instance should transfer the subscriptions of the failed client. It does so, but now how does it know which subscription goes with which configuration? When there is a future update to the publishing interval of one of the subscriptions, which subscription do we apply the change to? If we change the filtering of one of the monitored items, which subscription does that get associated with?
There are a few ways to tackle this problem off the top of my head, none of them perfect, nor do I think there is a perfect solution to this. One way is a Map
of some sort that would map Opc UA subscription Ids to application config Ids and maybe with a timestamp. From there if the timestamps in the Map
match the version range in the current client's config it could probably recreate the ReadValueId
and other missing information to close that loop. If the timestamps do not match the version range then I believe transferring the subscription should be avoided as there has been a configuration change.
Another potential approach would be to require that subscriptions are sorted by some constant and then added synchronously if a end-user wants to utilize subscription transfer. Then when the new client takes over it could match the sorted configurations to the subscription diagnostics array which should be in the same order (that's if the server doesn't change the order for some weird reason).
Another potential approach would be to leverage the subscription priority and to ask users that want to use redundant clients to prioritize their subscriptions. This would result in more predictable subscription notifications, if priorities are the same the server round-robins the notifications. Of course if a high volume subscription is prioritized highly it could impact the performance of the lowered prioritized subscriptions, but I don't know to what extent. At subscription transfer priority could just be matched and the client could go from there.
None of these are my favorite, but, maybe they get the ball rolling. I could be off base, maybe I'm seeing a potential issue that isn't there?
The first approach you outlined resonates with me the most, but that's probably because it's the most similar to how our application uses Milo (minus any kind of redundancy support).
Basically, our system has its own idea of what subscriptions and items should exist, and it passes them to the OPC UA layer to make it happen. I have some intermediate subscription object that tracks bits of info from both sides like:
BiMap<OpcSubscriptionItem, UaMonitoredItem>
etc...OpcSubscriptionItem
is the receiver of the callbacks and has enough information on it to derive a ReadValueId
.
If I had to figure out redundancy I'd add some more info like subscriptionId and maybe change the tracking of items to be ~stateless~ behavior-less objects rather than the callback targets themselves and keep those externally.
Then figure out how I could replicate this structure to the backup nodes. With the right info in this structure I could probably recreate the OpcUaSubscription
and OpcUaMonitoredItems
, with some minor modifications to Milo. I'd then implement the easy transfer subscription scenario in Milo assuming the transfer is happening in the same JVM, and then I could just pass in these subscriptions I just created from my synchronized structures, perhaps after some sanity checks to see if they still represent the most current configuration.
This is all stream of conscious so I might totally be missing something here.
Your system and mine seem to be similar.
The challenge of what you discuss is the mapping of the structure to the backup nodes.
I think maybe there is an approach that does not require us to go out of band, but, I want to run it by you to see what you think, because, complicated.
When we call createMonitoredItems
on a subscription we pass in an array of MonitoredItemCreateRequest
which has MonitoringParameters
which requires a UInteger clientHandle
for each item.
I believe the range for UInteger
is 4,294,967,295. I'm not 100% sure if the SDK or spec requires them to be unique per client, but let's impose that as a restriction for this solution if it is not already; which I think is reasonable.
At transfer, the new client should be able to match a UA Server's subscription Id to a configuration subscription by looking at the results when GetMonitoredItems
is called for a given server subscription Id. That should return a list of client and server handles, and since we know which client handles go with which subscriptions, we should then know that server handle 24
for example goes with subscription configuration foo
because it has client handle 50412
and so does configuration foo
. From there we can iterate the results of GetMonitoredItems
for each subscription and recreate what we need from our configuration.
It seems like this solution addresses the issues we've been discussing without needing to push mappings somewhere where other clients can consume them, simplifying things and removing some failure modes.
The spec doesn't enforce that constraint on client handles, but any sane client implementation should be using unique client handles per item, at least within the context of a single subscription... Hmm.
edit: my application assigns them globally unique, but the way subscriptions work and notifications are delivered only requires they be unique within each subscription, unless you're doing something weird and don't care that multiple items are assigned the same client handle. It's really up to the client. The server has its own unique id it assigns to the item and delivers in the response to creating an item. These server handles are only unique per subscription, not globally.
This cuts down on some of the out of band synchronization necessary, but what about ReadValueId
or its equivalent information? How does the client taking over the subscriptions ever figure out what the NodeId, AttributeId, etc... for each monitored item were?
edit2: You could derive the ReadValueId if you stored enough information in your configuration foo
I guess. But then you're still dealing with some out of band sync to get the client handles and configurations to appear on both Nodes. I suppose you could just deterministically assign the client handles per subscription configuration, though. That might be enough...
FWIW, after looking at this and discussing with you, my take on this is that client redundancy is under-spec'd and under-supported in the current revisions of the OPC UA spec and needs some work. Even just another method to get the rest of the missing information to recreate subscriptions and items might be enough to make it work.
So, ReadValueId
are required in the MonitoredItemCreateRequest
, so it's reasonable to assume the configuration can provide these at transfer as they were provided when the subscription was initially created.
That is, as long as client handles are part of the configuration (they are for me) then the client handles can be mapped to the configurations and simply recreated at transfer.
I do agree with your second comment, but feel with client handles being included in the config and what was presented previously there is something pretty powerful.
Additionally, if the server returned more information on the GetMonitoredItems
call (i.e. ReadValueId type info) I believe the amount of work that the SDK would need to perform would still be pretty close to the same as what would be necessary with this proposed solution.
I think you can start prototyping this approach with minimal changes in Milo at first. Everything discussed so far can happen outside the SDK, but there will be some changes necessary to allow externally created subscriptions to be added to the subscription manager and externally created items to be added to subscriptions.
And you'll have default/invalid values for a few of the "requested" parameters and the item's filter result I guess. But that's probably okay.
Just as an FYI I have begun the exercise of prototyping this out. If I hit any snags or if a significant amount of time passes I'll make sure to post any updates.
@kevinherron - would you be ok if I took a stab at implementing this as a PR and associating the commit with this enhancement request?
I agree, the server would need to have diagnostics enabled, otherwise the client should return an error.
Looking at the code, I think this enhancement can be achieved by adding logic to the UaSubscriptionManager/OpcUaSubscriptionManager for transferring a subscription. This could:
Of course I'd be completely open to any suggestions, thoughts, collaborations, or other points of interest you could share.
Is this something you'd be open to?
Originally posted by @marcus-orchard in https://github.com/eclipse/milo/issues/189#issuecomment-518794750