Fraunhofer-FIT-DIEN / iec104-python

A Python module to simulate SCADA and RTU communication over protocol 60870-5-104 to research ICT behavior in power grids.
https://iec104-python.readthedocs.io/latest/python/index.html
GNU General Public License v3.0
50 stars 9 forks source link

Python Script stops when the client is stopped without a connection #17

Closed FernandoMK closed 2 months ago

FernandoMK commented 7 months ago

Hello. I noticed that when the commands client.stop(), client.reconnect_all(), and connection.close() are called without a valid connection between client and server, the execution of the Python script stops immediately, not running until the end.

Below is a code example that reproduces this error on my machine:

import c104

IP_ADDRESS = "10.11.1.21" # Any invalid IP
STATION_COMMON_ADDRESS = 1

if __name__ == "__main__":
    client = c104.Client(tick_rate_ms=0, command_timeout_ms=5000)
    connection = client.add_connection(
        ip=IP_ADDRESS, port=2404, init=c104.Init.INTERROGATION
    )
    station = connection.add_station(common_address=STATION_COMMON_ADDRESS)

    while True:
        print("Loop Start")
        client.start()
        client.stop() # or client.reconnect_all()
        print("Loop End") # This never occurs, because the script will stop at client.stop()

Tested on Python 3.10 and 3.12 with C104 1.18.0 and 1.17.1.

Thanks in advance.

m-unkel commented 7 months ago

Thank you for your contribution, i can confirm the issue.

Loop Start
terminate called after throwing an instance of 'std::runtime_error'
  what():  Potential Deadlock: mutex Client::connections_mutex waiting for lock > 100ms
Aborted

There seems to be a deadlock.. if the locking takes much longer than expected, the process exits to avoid undetected deadlocks. I will create a fix, but I must acknowledge that due to current commitments, I won't be able to publish a new release before June.

m-unkel commented 7 months ago

An additional critical consideration: Avoid setting tick_rate_ms to 0. Instead, opt for a higher value, such as 1000ms (equivalent to one second), to prevent the blocking of Mutexes for other threads.

Setting a non-zero tick rate introduces a pause between each iteration of the client thread, ensuring that if an iteration completes before the tick rate time elapses, the thread will wait, allowing other threads to access Mutexes freely during this idle period.

Decreasing the tick rate beyond a certain threshold will not enhance the client's performance. The tick rate primarily influences the speed of reconnections, including the counting of open connections and the initiation sequence steps for establishing connections.

FernandoMK commented 7 months ago

I will create a fix, but I must acknowledge that due to current commitments, I won't be able to publish a new release before June.

Thank you, Martin. Take your time, and I appreciate the effort.

m-unkel commented 7 months ago

If you wish to, you can verify the fix via installing from source branch:

pip install git+https://github.com/Fraunhofer-FIT-DIEN/iec104-python.git@17-python-script-stops-when-the-client-is-stopped-without-a-connection

The fix will be included in the upcoming release.

FernandoMK commented 7 months ago

I ran the same code as before, but using tick_rate_ms=1000. It still stops the script...

However, I managed to run it continuously by introducing a sleep time between start and stop operations, as follows:

import c104
import time

IP_ADDRESS = "10.11.1.41"  # Any invalid IP
STATION_COMMON_ADDRESS = 1

if __name__ == "__main__":
    client = c104.Client(tick_rate_ms=1000, command_timeout_ms=5000)
    connection = client.add_connection(
        ip=IP_ADDRESS, port=2404, init=c104.Init.INTERROGATION
    )
    station = connection.add_station(common_address=STATION_COMMON_ADDRESS)

    while True:
        print("Loop Start")
        client.start()
        time.sleep(2)
        client.stop()  # or client.reconnect_all()
        print("Loop End")

Note: time.sleep(1) seems to work for some time, but eventually, it stops running as well.

m-unkel commented 4 months ago

This should be fixed finally.

pip install git+https://github.com/Fraunhofer-FIT-DIEN/iec104-python.git@dev

I will publish a new release soon.

m-unkel commented 2 months ago

Release 2.0.0 is now available, and it includes the fix that addresses this issue. You do not need to sleep anymore, the start and stop routines should in general be much faster now. Per default the tick_rate_ms is now 100ms, as well as the command_timeout_ms.

I would greatly appreciate your feedback. If the issue is resolved to your satisfaction, please consider closing this thread.

Thank you!

FernandoMK commented 2 months ago

Great to hear, Martin. I will check it out and provide feedback. Thank you very much.

FernandoMK commented 2 months ago

I would like to inform you that the documentation page is down, showing a 404 error. Another minor issue is in the README's Table of Contents, where the item 'wiki' does not exist. I think it should be 'documentation' instead.

FernandoMK commented 2 months ago

It seems that everything is working fine now. Thank you for your dedication to solving this issue. I will be closing the issue now.

coop-open-source commented 2 months ago

I would like to inform you that the documentation page is down, showing a 404 error. Another minor issue is in the README's Table of Contents, where the item 'wiki' does not exist. I think it should be 'documentation' instead.

Thank you, you are right. I enabled the versioning feature and therefore the link needs to be prefixed with /latest/. The Wiki link is easy to fix, but the pypi documentation link would require to create a new release.