BlackZork / mqmgateway

MQTT gateway for modbus networks
GNU Affero General Public License v3.0
36 stars 18 forks source link

Connection timeout errors from libmodbus #40

Closed alpytron closed 3 months ago

alpytron commented 4 months ago

Hi,

Unfortunately an errata in libmodbus causes a lot of "Connection timed out" errors for me. Since libmodbus does not respect interframe delay requirement of MODBUS RTU spec., it causes effective data loss and other communication issues if there are two or more slaves are polled on the same bus.

This issue was emerged in 2012, but the developer of libmodbus did not really want to implement the correct timing: https://groups.google.com/g/libmodbus/c/xZR66Gk_G2g

Someone has introduced a fix for it as a pull-request to libmodbus, but it still hasn't been merged: https://github.com/stephane/libmodbus/pull/494

It's important to mention that interframe delay of 3.5 characters is a must for reliable communication, since there is simply no other way for slave RTUs to recognize frame boundaries. In case of modmqttd this problem comes up when a write request to slave B is transmitted right after an acknowledgement frame is received from slave A. In this case modmqttd sends that write request so quickly that the interframe delay is less than 3.5 characters (especially at low baudrates like 9600). This causes slave B to not detect the end of ack. frame from slave A and interprets the write request as a part of previous frame, so discards it as slave address belongs to an other slave - causing a timeout error.

I'm just wondering that maybe this could be fixed at application side, like mqmgateway, or at least some workaround could be implemented to address this issue. I believe this causes a lot of errors for everyone who is trying to poll more than one device on the same RS485 bus.

Here's a snippet from my syslog: image

BlackZork commented 4 months ago

Have you tried to set modbus.min_delay_before_poll or min_delay_before_first_poll on modbus.slaves list?

Unfortunately there is no min_delay_before_first_poll setting for all slaves, but it is very easy to add.

alpytron commented 4 months ago

min_delay_before_poll is set to 50ms at the moment, but it does not prevent instant frame sending if the slave address points to an other slave. It seems there is no issue at all if I only do polling and not doing any writes, but as soon as I'm writing coils or holding registers, those write requests are sent out instantly when they are possible, so no delay will be applied at all.

I've tried routing the serial port traffic through socat to sniff on data flow and I could see the timing error there as well. Unfortunately I haven't saved that log, but the problem could be depicted as:

BlackZork commented 4 months ago

It seems that this is a bug that min_delay_before_poll/first_poll do not affect writes. I should probably rename it to 'min_delay_before_request' and make write requests to respect this delay.

alpytron commented 4 months ago

That is great idea, I believe. :) It would be even more useful if it could be set in microseconds as well, so the bus bandwidth could be maximized.

BlackZork commented 3 months ago

Well, it started as a simple addition, but ended as usual - as a major rewrite of the Modbus side of this project :-) A short description of all changes can be found in the commit message.

No microseconds yet, but easy to add. I've moved it as a new feature to #41. Feel free to buymeacoffee if this change works for you :-)

alpytron commented 3 months ago

OMG! That's a huge amount of changes. Absolutely deserves the coffee. :D

However, it seems these new config entries don't have any effect when they are specified globally so if I put them in the 'networks' section the timeout errors are still coming. But in the 'slaves' section they seem to get effective and everything works like charm. Here's the snippet from my current config.yaml:

networks:
    # list of modbus networks to poll from.
    - name: hub1
      # default values for waiting for modbus response
      # how log to wait for the first byte of a response
      response_timeout: 200ms
      # how log to wait for next bytes of a response
      response_data_timeout: 100ms
      # if device is defined then this is a ModbusRTU network
      device: /dev/ttyUSB0
      #device: /dev/ttyV1
      # serial port parameters
      baud: 9600
      parity: N
      data_bit: 8
      stop_bit: 1

      # Optional settings for RTU
      #rtu_serial_mode: rs232
      #rtu_serial_mode: rs485

      #rtu_rts_mode: up
      #rtu_rts_mode: down

      #rtu_rts_delay: 3000   # in microseconds

      # force silence before poll
      # if last poll was from different slave
      # delay_before_first_command: 80ms
      # delay_before_command: 15ms

    #- name: tcptest
      # if address is defined then this is a ModbusTCP network
      #address: 192.168.31.10
      #port: 502

      slaves:
        - address: 1
          delay_before_first_command: 50ms
          poll_groups:
            # Modbus counter stats
            - register: 8
              register_type: input
              count: 6
        - address: 2
          delay_before_first_command: 50ms
          poll_groups:
            # Modbus counter stats
            - register: 8
              register_t

There are two minor issues I noticed with build and install. At first, build stopped with an error:

$ make clean
$ make
[  4%] Building CXX object libmodmqttsrv/CMakeFiles/modmqttsrv.dir/config.cpp.o
[  8%] Building CXX object libmodmqttsrv/CMakeFiles/modmqttsrv.dir/conv_name_parser.cpp.o
[ 12%] Building CXX object libmodmqttsrv/CMakeFiles/modmqttsrv.dir/debugtools.cpp.o
[ 16%] Building CXX object libmodmqttsrv/CMakeFiles/modmqttsrv.dir/default_command_converter.cpp.o
[ 20%] Building CXX object libmodmqttsrv/CMakeFiles/modmqttsrv.dir/logging.cpp.o
[ 25%] Building CXX object libmodmqttsrv/CMakeFiles/modmqttsrv.dir/modbus_client.cpp.o
[ 29%] Building CXX object libmodmqttsrv/CMakeFiles/modmqttsrv.dir/modbus_context.cpp.o
[ 33%] Building CXX object libmodmqttsrv/CMakeFiles/modmqttsrv.dir/modbus_executor.cpp.o
/home/alpy/mqmgateway/libmodmqttsrv/modbus_executor.cpp:15:41: error: conflicting declaration ‘constexpr const milliseconds modmqttd::ModbusExecutor::WRITE_BATCH_SIZE’
   15 |     constexpr std::chrono::milliseconds ModbusExecutor::WRITE_BATCH_SIZE;
      |                                         ^~~~~~~~~~~~~~
In file included from /home/alpy/mqmgateway/libmodmqttsrv/modbus_executor.cpp:4:
/home/alpy/mqmgateway/libmodmqttsrv/modbus_executor.hpp:15:32: note: previous declaration as ‘constexpr const short int modmqttd::ModbusExecutor::WRITE_BATCH_SIZE’
   15 |         static constexpr short WRITE_BATCH_SIZE = 10;
      |                                ^~~~~~~~~~~~~~~~
/home/alpy/mqmgateway/libmodmqttsrv/modbus_executor.cpp: In member function ‘void modmqttd::ModbusExecutor::addPollList(const std::map<int, std::vector<std::shared_ptr<modmqttd::RegisterPoll> > >&, bool)’:
/home/alpy/mqmgateway/libmodmqttsrv/modbus_executor.cpp:62:20: warning: structured bindings only available with ‘-std=c++17’ or ‘-std=gnu++17’
   62 |         const auto [q_it, success] = mSlaveQueues.insert({pit.first, ModbusRequestsQueues()});
      |                    ^
make[2]: *** [libmodmqttsrv/CMakeFiles/modmqttsrv.dir/build.make:154: libmodmqttsrv/CMakeFiles/modmqttsrv.dir/modbus_executor.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:176: libmodmqttsrv/CMakeFiles/modmqttsrv.dir/all] Error 2
make: *** [Makefile:130: all] Error 2

I could solve it by adding -DCMAKE_CXX_STANDARD="17" to cmake command line: $ cmake -DCMAKE_CXX_STANDARD="17" -DWITHOUT_TESTS=1

An other finding was that make install puts the binary into /usr/local/bin while the systemd .service file points to /bin/modmqttd, so I had to edit modmqttd.service accordingly: ExecStart=/usr/local/bin/modmqttd --config=/etc/modmqttd/config.yaml

BlackZork commented 3 months ago

Thanks! :-)

I've just fixed all gcc 9.x related errors. It should compile without DCMAKE_CXX_STANDARD="17" flag.

If you want to install mqmgateway along system packages, specify /usr prefix: cmake -DCMAKE_INSTALL_PREFIX:PATH=/usr, otherwise it will go to /usr/local.

As for global config for delays, I cannot reproduce it. Yaml config parser still lacks good validation, so maybe there is a typo somewhere. If it does not work for you, please post not working "modbus" section of your config - where you specify config values globally.

BlackZork commented 3 months ago

Just found a critical bug in this code and fixed in 61de12f. Update strongly recommended.