FreeOpcUa / python-opcua

LGPL Pure Python OPC-UA Client and Server
http://freeopcua.github.io/
GNU Lesser General Public License v3.0
1.33k stars 661 forks source link

Secure channel response error #1122

Open karthikvr1 opened 3 years ago

karthikvr1 commented 3 years ago

Hello Experts, I am trying to establish connection to a device (bnrx20) and trying to subscribe for data changes. After a while, i get the following error.

Callback: New data change ns=2;i=212992 0 Callback: New data change ns=2;i=212994 632 ERROR:concurrent.futures:exception calling callback for <Future at 0x4c37e80 state=cancelled> Traceback (most recent call last): File "C:\Users\Python\Python38-32\lib\concurrent\futures_base.py", line 328, in _invoke_callbacks callback(self) File "C:\Users\Python\Python38-32\lib\site-packages\opcua\client\ua_client.py", line 201, in clb response = struct_from_binary(ua.OpenSecureChannelResponse, future.result()) File "C:\Users\Python\Python38-32\lib\concurrent\futures_base.py", line 430, in result raise CancelledError() concurrent.futures._base.CancelledError

Can you please help me understand the reason and way to resolve the above? The machine with script which is connecting to the device and the device are all in the same lab so i rule out any latency and timeout issues.

Also attached is the script as well. x20_opcua_client.py.txt

AndreasHeine commented 3 years ago

guess its an issue with your threads... to be honest i would use asyncua (non blocking python-opcua) for that so you dont need to take care about threads :) asyncio is a lot easyer in handling then threading!

karthikvr1 commented 3 years ago

Hello Andreas, Thanks a lot for your response. I have used the additional thread to gather data and dump in the database occasionally and not for every datapoint collected. Will that be an impediment? Am curious to know. Thanks in advance.

Regards, Karthik VR

AndreasHeine commented 3 years ago

i hope i dont miss read your code but it looks like you have a main thread with your client-instance and then you open a new thread ("t1") and make requests to the server from this second thread ("get_all_node_data" method). i am not sure if that work this way... typically its easyer to use async programing for network or database related stuff. threading in python is not like its in other languages... threading (python) and async (python) are single process! it is easy to make a mistake with threads (thats why i dont like them and there arent fast and dont use multiple cores...) if you would use asyncio you can append tasks (one for each notification) to the eventloop and your code would be much simpler

@swamper123 any suggestions? (havent used thread that much...)

swamper123 commented 3 years ago

It is a while ago since I used threads in python tbh. But I totally agree, that it is easier to handle asyncios & Futures instead of Threads. With a good IDE you can even understand what may have crashed in asyncio, which is more difficult with threads (where good logging is necessarry).

Anyway:

@karthikvr1 How long is "a while"? An hour? Some minutes? Does it appear regulary (so every 5 Minutes)? This may encircle the error a bit.

I can't point with the finger on it, but I have the feeling that client.connect() failed and this exception wasn't cought propperly ( I only found one reference to Concurrent.CancelledError). You may catch it if you replace except Exception as e: with except BaseException as e:. Why the connection may have failed, I can't tell you, because this can be caused by multiple (magic) reasons and I haven't got your setup.

EDIT: I just recognised that you already know that it is a connection issue.

EDIT 2: @karthikvr1 A question, because I don't know that combo you used: While gProceed is true, you are always make t1.join(), but never start the thread again. Is that legit?

karthikvr1 commented 3 years ago

Hello Andreas, Thanks for the pointer. I am using the function get_all_node_data() to get the node display name. I didn't knew that it will spawn another thread. I am using this call to identify the data source from a list of data sources. I will implement an internal lookup table instead of calls the the OPCUA server every time.

Hello Swamper, The error is frequently observer after around 7 to 8 hours. I start the script in the morning and see it once in the day. I will restart the script again before going to bed and see the error in the morning or after couple of hours from morning.

Thanks on this observation. Glad to expand on this. Flag gProceed is always true and is made false when I hit CTRL+C. In the recent modification, I have changed the thread join call in the exception where I am assigning it to false.

I have the following lean code running in my environment.

Code Start----------------------------------------

from opcua import Client

class SubHandler(object): def datachange_notification(self, node, val, data): print("New data change ", node, val)

if name == "main": client = Client("opc.tcp://192.168.1.1:4840/") client.connect() myvar = client.get_node("ns=2;i=212994") handler = SubHandler() sub = client.create_subscription(500, handler) handle = sub.subscribe_data_change(myvar)

Code End----------------------------------------

It has run for 1 full day without any errors. I will try to build something around this without using threads within the same code.

Also attached are the library changes that i had to do to get the script working. Can you kindly throw some light on the version of library that you are using? 2 3 1

swamper123 commented 3 years ago

TBH, I don't use python-opcua at all, but I recommened the newest master branch. I work with opcua-asyncio for the reasons Andreas and I mentioned. Theay still look similar in many cases and sometimes people mention good questions/issues either here or there, so both repos help each other. So that's why I'm here around. ^^

The changes you made with the client connection timeout is a known issue. It was mentioned in #1117 and I know in others as well, but I can't find them now. It was something about resubscribe to nodes after session restore (has there been a fix @AndreasHeine ? I can't remember one at the moment.).

A tipp is, while running/testing such things, monitor your ressources (Threads, RAM) over time. RAM is allowed to rise until a specific level, then the garbage collector shall kick in. Threads should be pretty constant over time (unless you are creating/joining some on purpose).

AndreasHeine commented 3 years ago

@karthikvr1 @swamper123 tbh there is no fix atm. for me the most robust way is to monitor the connection (with a cyclic node read) and if it fails try disconnect and delete the subscription then connect and subscribe again... there are many cases where a connection could break!