convertersystems / opc-ua-client

Visualize and control your enterprise using OPC Unified Architecture (OPC UA) and Visual Studio.
MIT License
405 stars 119 forks source link

Error publishing subscription. The session cannot be used because ActivateSession has not been called. #137

Closed vladislav-mitev closed 4 years ago

vladislav-mitev commented 4 years ago

We are using the opc-ua-client library to connect to multiple OPC UA servers, and we are having issues only with one server. We are using client library version 2.2.1 What we do in short:

  1. Open a channel
  2. Read a few tags
_channel = new UaTcpSessionChannel (
    appDescription,
    certificateStore,
    ShowSignInDialog,
    remoteEndpoint: new EndpointDescription {
        EndpointUrl = DiscoveryUrl,
        SecurityPolicyUri = securityPolicyUri,
        SecurityMode = messageSecurityMode },
    loggerFactory: EnableInternalLogs ? loggerFactory : new LoggerFactory(),
    options: new UaTcpSessionChannelOptions()
    {
        TimeoutHint = SessionChannelTimeout
    }
);
await _channel.OpenAsync().ConfigureAwait(false);
...
ReadValueId rVal = new ReadValueId
{
    NodeId = BuildNodeId(namespaceIdx, id, idType),
    AttributeId = AttributeIds.Value
};
ReadResponse response = await _channel.ReadAsync(new ReadRequest()
{
    NodesToRead = new ReadValueId[] { rVal }
}).ConfigureAwait(false);

Log(LogLevel.Info, $"{rVal.NodeId.NamespaceIndex}:{rVal.NodeId.Identifier}={response.Results[0].Value}");

What happens:

Here are all relevant logs, those marked [INTERNAL] are from the client library:

[2020-02-24 14:03:37.3566] [DEBUG] Creating a session with the server... [2020-02-24 14:03:37.3566] [DEBUG] SecurityPolicyUri: 'http://opcfoundation.org/UA/SecurityPolicy#None'. [2020-02-24 14:03:37.3566] [DEBUG] SecurityMode: 'None'. [2020-02-24 14:03:37.3625] [DEBUG] UserIdentity: ''. [2020-02-24 14:03:37.3625] [INFO] [INTERNAL] Discovering endpoints of 'opc.tcp://192.168.1.120:4840'. [2020-02-24 14:03:37.3625] [DEBUG] [INTERNAL] Channel opening. [2020-02-24 14:03:37.4267] [DEBUG] [INTERNAL] Sending OpenSecureChannelRequest, Handle: 1 [2020-02-24 14:03:37.4622] [DEBUG] [INTERNAL] Received OpenSecureChannelResponse, Handle: 1 Result: 0x00000000 [2020-02-24 14:03:37.4737] [DEBUG] [INTERNAL] Channel opened. [2020-02-24 14:03:37.4737] [DEBUG] [INTERNAL] Sending GetEndpointsRequest, Handle: 2 [2020-02-24 14:03:37.4909] [DEBUG] [INTERNAL] Installed new security token 1. [2020-02-24 14:03:37.4960] [DEBUG] [INTERNAL] Received GetEndpointsResponse, Handle: 2 Result: 0x00000000 [2020-02-24 14:03:37.5047] [DEBUG] [INTERNAL] Channel closing. [2020-02-24 14:03:37.5082] [DEBUG] [INTERNAL] Sending CloseSecureChannelRequest, Handle: 3 [2020-02-24 14:03:37.5269] [ERROR] [INTERNAL] Error closing secure channel. Timeout occurred while processing the request. [2020-02-24 14:03:37.5269] [DEBUG] [INTERNAL] Channel closed. [2020-02-24 14:03:37.5325] [DEBUG] [INTERNAL] Success discovering endpoints of 'opc.tcp://192.168.1.120:4840'. [2020-02-24 14:03:37.5374] [DEBUG] Client is building Anonymous identity... [2020-02-24 14:03:37.5374] [DEBUG] Done building identity. [2020-02-24 14:03:37.5374] [DEBUG] [INTERNAL] Channel opening. [2020-02-24 14:03:37.5387] [DEBUG] Opening channel with endpoint 'opc.tcp://192.168.1.120:4840'. [2020-02-24 14:03:37.5610] [INFO] [INTERNAL] Opening session channel with endpoint 'opc.tcp://192.168.1.120:4840'. [2020-02-24 14:03:37.5610] [INFO] [INTERNAL] SecurityPolicy: 'http://opcfoundation.org/UA/SecurityPolicy#None'. [2020-02-24 14:03:37.5610] [INFO] [INTERNAL] SecurityMode: 'None'. [2020-02-24 14:03:37.5610] [INFO] [INTERNAL] UserIdentity: 'Workstation.ServiceModel.Ua.AnonymousIdentity'. [2020-02-24 14:03:37.5756] [DEBUG] [INTERNAL] Sending OpenSecureChannelRequest, Handle: 1 [2020-02-24 14:03:37.5790] [DEBUG] [INTERNAL] Received OpenSecureChannelResponse, Handle: 1 Result: 0x00000000 [2020-02-24 14:03:37.5835] [DEBUG] [INTERNAL] Sending CreateSessionRequest, Handle: 2 [2020-02-24 14:03:37.6029] [DEBUG] [INTERNAL] Installed new security token 1. [2020-02-24 14:03:37.6029] [DEBUG] [INTERNAL] Received CreateSessionResponse, Handle: 2 Result: 0x00000000 [2020-02-24 14:03:37.6120] [DEBUG] [INTERNAL] Sending ActivateSessionRequest, Handle: 3 [2020-02-24 14:03:37.6321] [DEBUG] [INTERNAL] Received ActivateSessionResponse, Handle: 3 Result: 0x00000000 [2020-02-24 14:03:37.6368] [DEBUG] [INTERNAL] Sending ReadRequest, Handle: 4 [2020-02-24 14:03:37.6636] [DEBUG] [INTERNAL] Received ReadResponse, Handle: 4 Result: 0x00000000 [2020-02-24 14:03:37.6701] [DEBUG] [INTERNAL] Sending CreateSubscriptionRequest, Handle: 5 [2020-02-24 14:03:37.6806] [DEBUG] [INTERNAL] Received CreateSubscriptionResponse, Handle: 5 Result: 0x00000000 [2020-02-24 14:03:37.6831] [DEBUG] [INTERNAL] Channel opened. [2020-02-24 14:03:37.6831] [DEBUG] Opened channel with endpoint 'opc.tcp://192.168.1.120:4840'. [2020-02-24 14:03:37.6953] [DEBUG] [INTERNAL] Sending PublishRequest, Handle: 6 [2020-02-24 14:03:37.6988] [DEBUG] [INTERNAL] Sending ReadRequest, Handle: 7 [2020-02-24 14:03:37.7046] [DEBUG] [INTERNAL] Received ReadResponse, Handle: 7 Result: 0x00000000 [2020-02-24 14:03:37.7078] [INFO] 3:0001_Alarms= [2020-02-24 14:03:37.7078] [DEBUG] [INTERNAL] Sending ReadRequest, Handle: 8 [2020-02-24 14:03:37.7078] [DEBUG] [INTERNAL] Sending PublishRequest, Handle: 9 [2020-02-24 14:03:37.7078] [DEBUG] [INTERNAL] Sending PublishRequest, Handle: 10 [2020-02-24 14:03:37.7145] [DEBUG] [INTERNAL] Received ReadResponse, Handle: 8 Result: 0x00000000 [2020-02-24 14:03:37.7145] [INFO] 3:0002_State= [2020-02-24 14:03:37.7145] [DEBUG] [INTERNAL] Sending ReadRequest, Handle: 11 [2020-02-24 14:03:37.7248] [DEBUG] [INTERNAL] Received ReadResponse, Handle: 11 Result: 0x00000000 [2020-02-24 14:03:37.7248] [INFO] 3:0003_Mode= [2020-02-24 14:03:37.7248] [INFO] Created subscription. [2020-02-24 14:03:38.6809] [DEBUG] [INTERNAL] Received PublishResponse, Handle: 6 Result: 0x00000000 [2020-02-24 14:03:38.6855] [DEBUG] [INTERNAL] Sending PublishRequest, Handle: 12 [2020-02-24 14:04:08.6908] [DEBUG] [INTERNAL] Received PublishResponse, Handle: 9 Result: 0x00000000 [2020-02-24 14:04:08.6908] [DEBUG] [INTERNAL] Sending PublishRequest, Handle: 13 [2020-02-24 14:04:08.6970] [DEBUG] [INTERNAL] Received ServiceFault, Handle: 13 Result: 0x80270000 [2020-02-24 14:04:08.7014] [ERROR] [INTERNAL] Error publishing subscription. The session cannot be used because ActivateSession has not been called.

Do we do something wrong or is there a flaw in the client library? Any help is highly appreciated!

awcullen commented 4 years ago

If UAExpert works fine, then there must be a flaw! What is the manufacturer of the OPC UA server at 'opc.tcp://192.168.1.120:4840'?
You can help me by logging the status code of the Read results. Log(LogLevel.Info, $"{rVal.NodeId.NamespaceIndex}:{rVal.NodeId.Identifier}={response.Results[0].Value}, {response.Results[0].StatusCode}");

vladislav-mitev commented 4 years ago

The server is a Siemens Simatic S7-1500: https://www.siemens.com/s7-1500 In the provided logs above there is the status code of each Read response. However, reading of data and subscribing for data change events (CreateMonitoredItemsRequest) is OK, as we found that our identifier was wrong. However, the error persists.

We found that the error is affected by the KeepAliveCount setting on the subscription which we use for our CreateMonitoredItemsRequest. The error occurres for KeepAliveCount values: 20, 30, 31, 32, 33, 38, 45, 48 Everything is OK for KeepAliveCount values: 10, 25, 26, 27, 28, 29, 41, 49, 50, 74

Other subscription settings that we use: PublishingInterval=1000; LifetimeCount=30; PublishingEnabled=true.

UPDATE: Actually some "good" settings from above are also raising errors, but after a lot of time. So, for us the only known working configuration is: PublishingInterval=500; KeepAliveCount=10; LifetimeCount=30; PublishingEnabled=true.

Galama1979 commented 4 years ago

Hello, I can confirm the issue with a Siemens S7-1500 and the integrated OPC UA server.

The error occurres for KeepAliveCount values: 30, 31, 32, 33, 38, 45, 48 Everything is OK for KeepAliveCount values: 10, 20, 25, 26, 27, 28, 29, 41, 49, 50, 74

@vladislav-mitev thank you for this hint, you made my day!

We have phases without any active subscriptions or any OpcUa communication, in this case the StateMachineAsync tasks of the UaTcpSessionChannel itsef fail after 1 Minute of inactivity causing the next ReadAsync/WriteAsync to fail with an "The session id is not valid.". I did not observe a state transition in the session channel, the Faulted or Closed events of the session channel are not invoked.

The log shows, that there is no communication activity after the opening of the channel, the S7-1500 is also idle.

2020-03-04 12:10:22,023 [11] Information OpcUa.Workstation.UaClient: Opening session channel with endpoint 'opc.tcp://192.168.214.1:4840/'. 2020-03-04 12:10:22,023 [11] Information OpcUa.Workstation.UaClient: SecurityPolicy: 'http://opcfoundation.org/UA/SecurityPolicy#Basic256Sha256'. 2020-03-04 12:10:22,023 [11] Information OpcUa.Workstation.UaClient: SecurityMode: 'SignAndEncrypt'. 2020-03-04 12:10:22,023 [11] Information OpcUa.Workstation.UaClient: UserIdentity: 'Workstation.ServiceModel.Ua.AnonymousIdentity'. 2020-03-04 12:10:24,727 [4] Information LowLevelPlcOpcUa: SessionChannel 'ns=1;i=819712547' was opened! 2020-03-04 12:10:55,818 [7] Error OpcUa.Workstation.UaClient: Error publishing subscription. The session cannot be used because ActivateSession has not been called. 2020-03-04 12:11:25,823 [10] Error OpcUa.Workstation.UaClient: Error publishing subscription. The session cannot be used because ActivateSession has not been called. 2020-03-04 12:11:55,828 [6] Error OpcUa.Workstation.UaClient: Error publishing subscription. The session cannot be used because ActivateSession has not been called.

Changing the DefaultKeepaliveCount used for the internal Subscription in UaTcpSessionChannel.OnOpenAsync to 29 as suggested by @vladislav-mitev "resolves" the issue.

Galama1979 commented 4 years ago

@awcullen Hi Andrew, is there anything I could do to assist in tracking down this issue? As it seems to be connected to the Siemens S7-1500, I could be hard for you to reproduce or analyze the issue. I would try to get verbose logging, but I need some push in the right direction where to look for the error. Could you give me any hint where to start?

awcullen commented 4 years ago

I got as far as installing S7-PLCSIM Advanced and TIA Portal. Next, I need to set up a test to reproduce the issue. I want to thank you for all your work in finding a work around.

awcullen commented 4 years ago

Well, I did try to catch this issue. Please set the LifetimeCount = 3 * MaxKeepAliveCount

// build a CreateSubscriptionRequest. See 'OPC UA Spec Part 4' paragraph 5.13.2
var req = new CreateSubscriptionRequest
{
    // the publish interval in milliseconds
    RequestedPublishingInterval = 1000.0, 
    // the number of publish intervals the server will wait to send an empty 'keep-alive' response
    RequestedMaxKeepAliveCount = 30,  
    // the number of publish intervals the server will wait to close the subscription
    RequestedLifetimeCount = 30 * 3,
    PublishingEnabled = true,
};
Galama1979 commented 4 years ago

Well, I did try to catch this issue. Please set the LifetimeCount = 3 * MaxKeepAliveCount

// build a CreateSubscriptionRequest. See 'OPC UA Spec Part 4' paragraph 5.13.2
var req = new CreateSubscriptionRequest
{
    // the publish interval in milliseconds
    RequestedPublishingInterval = 1000.0, 
    // the number of publish intervals the server will wait to send an empty 'keep-alive' response
    RequestedMaxKeepAliveCount = 30,  
    // the number of publish intervals the server will wait to close the subscription
    RequestedLifetimeCount = 30 * 3,
    PublishingEnabled = true,
};

Thank you for your effort, I am back in the office and have access to the Siemens HW. I will let you know the results as soon as I ran some tests.

vladislav-mitev commented 4 years ago

Thank you, @awcullen , for your great support! This indeed seems to be the real answer to our problems.

Galama1979 commented 4 years ago

Hello @awcullen, thank you again for checking this issue. I ran some tests as promised, but I dont see any clearer. Maybe my understanding of the OpcUa system is not suffiecient, or the issue from @vladislav-mitev and me are not really identical und do only show the same symptoms.

My issue is the failure of the internal keep-alive subscription/state machine taks. I have checked the values in UaTcpSessionChannel you are using internally.

UaTcpSessionChannel.cs:

...
        public const double DefaultPublishingInterval = 1000f;
        public const uint DefaultKeepaliveCount = 30;
...
            // create the keep alive subscription.
            var subscriptionRequest = new CreateSubscriptionRequest
            {
                RequestedPublishingInterval = DefaultPublishingInterval,
                RequestedMaxKeepAliveCount = DefaultKeepaliveCount,
                RequestedLifetimeCount = DefaultKeepaliveCount * 3,
                PublishingEnabled = true,
            };

...

These are excatly the values suggested by you, so they should be "good" values according to the OpcUaStandard. However after 30 seconds of inactivity on the channel (no subscriptions, no read/write requests,...), I get the "Error publishing subscription" from the state machine tasks inside the UaTcpSessionChannel. I would expect the channel to be kept open, even when it is not used at the moment. Are my expectations wrong?

When I setup an subscription to a monitored item, using your suggested settings, neither subscription times out, but for this is more of a workaround.

I hope my description is not too confusing. Any help is highly appreceated.

awcullen commented 4 years ago

Would you share the firmware version of the S7-1500 controller. I'm using the S7-PLCSIM Advanced soft plc for testing. Knowing the firmware version will eliminate a variable in the testing.

Galama1979 commented 4 years ago

Knowing the firmware version will eliminate a variable in the testing.

Firmware Version is v02.06.01

awcullen commented 4 years ago

Ok. I reproduced the same issue. That's the first step in fixing this!

awcullen commented 4 years ago

To fix this issue, I will release a new version 2.5.2 in the coming days. To work around this issue today, you can changes the MaxSessionTimeout setting from 30 sec to 120 sec and download to your controller. Thanks again for reporting this issue. image

vladislav-mitev commented 4 years ago

To fix this issue, I will release a new version 2.5.2 in the coming days. To work around this issue today, you can changes the MaxSessionTimeout setting from 30 sec to 120 sec and download to your controller. Thanks again for reporting this issue. image

Could you please explain the reason behind all this. Understanding it would help us locate potential problems in our current implementations! Many, many thanks and appreciations!

awcullen commented 4 years ago

Yes, let me explain. The S7-1500 setting 'Max session timeout' is the number of seconds that a session connection may be unused before being closed by the server. In this version of firmware, the default timeout is 30 seconds (rather a short time). So, if a opc client does nothing with the connection for 30 seconds, the opc server will just close the connection. The UaTcpSessionChannel uses a background subscription as a keep-alive message. This subscription has a keep-alive interval of 30 seconds. Unfortunately the session timeout always won the battle and would close the connection. I plan to change this keep-alive interval to be 5 seconds, the same as UAExpert. This 5 second interval is only for the single internal subscription. Your subscriptions should use what is best for your program. The defaults will remain 30 secs for keep-alive, 3 * 30 seconds for lifetime.

Galama1979 commented 4 years ago

@awcullen Great work, thank you for all the effort you put in this issue. Looking forward to the next release,

awcullen commented 4 years ago

I have uploaded v2.5.2 to NuGet. I would love to know if this fixes the problem.

Galama1979 commented 4 years ago

Sorry for the late reply, I was in HomeOffice for some days due to the plague.

I did a quick test with v2.5.2:

I don't see any sense in running up a long-term test. Wohever if this would help you to finally close the issue I could set up one.

awcullen commented 4 years ago

Thank you Galama. Be healthy and safe.