OPCFoundation / UA-.NETStandard

OPC Unified Architecture .NET Standard
Other
1.96k stars 946 forks source link

OPC UA Server becomes unresponsive after 1-2 days of continuous run #2716

Closed vinaybr closed 2 months ago

vinaybr commented 2 months ago

Type of issue

Current Behavior

I have a OPC UA Server which is running as a windows service. We recently upgraded the OPC Foundation stack to v1.5.374.78. During one of our long run tests we observed that the client disconnected from the server after 1-2 days of continuous run. The server process is still running i.e., no crash. The windows service is also in running state. The client gets a BadTimeout error when it tries to reconnect.

Expected Behavior

No response

Steps To Reproduce

No response

Environment

- OS:Windows 10
- Environment: Visual Studio 2022, .Net6
- Runtime: .Net6
- Nuget Version:1.5.374.78
- Component:OPCFoundation.NetStandard.Opc.Ua
- Server: COMIOP
- Client: UAExpert

Anything else?

I took the dump file of the opc server process and on analysis of the threads i see that many of the threads are stuck waiting for a lock -

[HelperMethodFrame_1OBJ: 00000052cabff308] System.Threading.Monitor.ReliableEnter(System.Object, Boolean ByRef) Opc.Ua.Server.ServerInternalData.get_IsRunning() Opc.Ua.Server.StandardServer.ValidateRequest(Opc.Ua.RequestHeader, Opc.Ua.Server.RequestType) Opc.Ua.Server.StandardServer.CreateSession(Opc.Ua.RequestHeader, Opc.Ua.ApplicationDescription, System.String, System.String, System.String, Byte[], Byte[], Double, UInt32, Opc.Ua.NodeId ByRef, Opc.Ua.NodeId ByRef, Double ByRef, Byte[] ByRef, Byte[] ByRef, Opc.Ua.EndpointDescriptionCollection ByRef, Opc.Ua.SignedSoftwareCertificateCollection ByRef, Opc.Ua.SignatureData ByRef, UInt32 ByRef) Opc.Ua.SessionEndpoint.CreateSession(Opc.Ua.IServiceRequest) Opc.Ua.EndpointBase+ProcessRequestAsyncResult.OnProcessRequest(System.Object) Opc.Ua.ServerBase+RequestQueue.OnProcessRequestQueue(System.Object) [DebuggerU2MCatchHandlerFrame: 00000052cabffae0]

mregen commented 2 months ago

@vinaybr thanks for the report. Since a few days we know the root cause as there is a regression which can cause a deadlock when cleaning up the channels. A fix is in the works #2714.