Extend stdconv converters: divide for 4-byte numbers, string conversion

git-developer commented 1 year ago

This PR adds a few functions that I was missing when integrating an energy meter:

Divide for numbers spanning 4 bytes (2 registers)
Conversion from multiple registers to a string

The conversions consider the byte-order (big-endian / little-endian), using the same keyword low_first as the existing converters.

Example config (from a real energy meter, model ALD1D5F from SBC):

  objects:
    - topic: mqmgateway/demo-meter/energy-total
      state:
        register: rtutest.30.28
        register_type: holding
        count: 2
        converter: std.divide(100)
    - topic: mqmgateway/demo-meter/type
      state:
        register: rtutest.30.7
        register_type: holding
        count: 8
        converter: std.string(low_first)

git-developer commented 1 year ago

This PR may be tested with the docker image ckware/mqmgateway:feature-stdconv (multi-arch, based on alpine, including exprtk).

BlackZork commented 1 year ago

Many thanks for your contribution!. Due to dual-licensing I need CLA for this pull request. Please contact me directly at lm@zork.pl. Thanks!

git-developer commented 1 year ago

I sent you a mail.

BlackZork commented 1 year ago

I merged std.divide, added tests for missing cases and fixed some bugs. See d762742.

For divide there is still a problem with uint32 value. I think in near future I would have to introduce positional arguments, so one can write std.divide(5, value_type=uint, precision=3) or just std.divide(5) for default int value and precision=6`

As for string, do you have any example for a device, where string data is encoded in such a strange way as with "low_first" args? An example "12345" encoded as 0x3231 0x3433 0x0035. It looks very strange to swap pair of string bytes in each register separately. Swapped termination byte (0x0) before last data byte looks super strange...

If not, I would rather not add this low_first conversion and support only default one, when string bytes are encoded in subsequent registers from left to right. Looking at the code I this there is a bug in toMqtt when 0x0 is not at last byte in last register. When we encounter 0x0 we should stop processing rest of register data.

git-developer commented 1 year ago

For divide there is still a problem with uint32 value. I think in near future I would have to introduce positional arguments, so one can write std.divide(5, value_type=uint, precision=3) or just std.divide(5) for default int value and precision=6`

Positional arguments would be great. uint32 is not the only type that need special handling. Some energy meters (e.g. Eastron SDM120) use the 32-Bit float dataype (which is currently not supported by mqmgateway). With a value_type argument, divide() could be extended for float later on.

As for string, do you have any example for a device, where string data is encoded in such a strange way as with "low_first" args? An example "12345" encoded as 0x3231 0x3433 0x0035. It looks very strange to swap pair of string bytes in each register separately. Swapped termination byte (0x0) before last data byte looks super strange...

Yes, I agree this looks quite strange. The energy meter SBC ALD1D5F (docs) behaves like this. The string is spanning multiple registers (see registers 7-14 on page 6 in the manual). I assume the device is big-endian, and processing is done per-register internally. The last register (14) contains a single character 0x30 which is 0x0030 in big-endian, so swapping to little-endian gives [0x30, 0x00]. I can post a dump if that helps (but this will not be before end of July).

there is a bug in toMqtt when 0x0 is not at last byte in last register

In my first implementation, I handled this case (null byte not at the end) explicitely, but then removed it later on because the behavior should be the same: A byte sequence that contains a null byte (0x0) in the middle is not a string. The behavior of the current code will return all the characters before the first null byte. For example, the byte array [0x41, 0x00, 0x42, 0x43] (A\0 BC in little-endian) interpreted as string will give A. We could stop processing when the first null byte occurs, but the behavior will be the same. If I don't miss something here and you agree, I could add a test case to make that more clear.

BlackZork commented 1 year ago

Positional arguments would be great. uint32 is not the only type that need special handling. Some energy meters (e.g. Eastron SDM120) use the 32-Bit float dataype (which is currently not supported by mqmgateway). With a value_type argument, divide() could be extended for float later on.

When time allows in the future I also plan to extend exprtk plugin and add functions for int32le, int32be, uint32le, flt32le etc.., so you can use it instead of stdconv:

expr.evaluate("int32le(R1,R2)/100", 3)

As for string, I would rather not mix byte order dedicated for int/double values "low_first/high_first" for string encoding. Better use well known character encoding names where possible: ascii, iso-8851, utf-8, utf-16be, utf-16le etc. The problem is how to name this strange encoding that SBC energy meter does :-)

Looking at manual I understand that we have 8 2-character ascii strings, each encoded on big-endian uint16, that we should concentrate.

Maybe we could name it 'ascii-16be'?

Other idea is to introduce "enum" converter that could do it this way:

    - topic: mqmgateway/demo-meter/type
      state:
        register: rtutest.30.12
        register_type: holding
        converter: 
          name: strenum
          values:
            0x3241: "ALD1D5FD00A2A00"
            0x3341: "ALD1D5FD00A3A00"

It could work in both directions, converting a single uint_16 register value into string and vice versa.

git-developer commented 1 year ago

I also plan to extend exprtk plugin

That sounds like a good idea to me which could make the divide converter obsolete in the end.

I would rather not mix byte order [...] Maybe we could name it 'ascii-16be'?

That's a good idea, too. The name says what the converter does. Do you think the converter should stay a (generic) string converter with an encoding parameter, or do you prefer a dedicated converter for this special encoding?

Other idea is to introduce "enum" converter

I don't get this idea. Think of a different model number, e.g. 3-phase meter ALE3D5FD10C2A00. The string converter would simply read registers 7 to 14, convert them and is done. An enum converter would need a user configuration which knows all model numbers in advance. I don't see a benefit here. Thus I like the string converter idea better.

BlackZork commented 1 year ago

That's a good idea, too. The name says what the converter does. Do you think the converter should stay a (generic) string converter with an encoding parameter, or do you prefer a dedicated converter for this special encoding?

Generic string with encoding=none (default) or ascii-16be (just one arg for now, until positional arguments are implemented). Ideally with some explanation in documentation what it does :-)

git-developer commented 1 year ago

I refactored the string converter as discussed, feedback is welcome.

BlackZork commented 1 year ago

Now I am little bit lost. In When a string with encoding 'none' and with odd character count section you have :

std::vector<uint16_t> registers({0x4C41, 0x3144, 0x0035}); - bytes are swapped.

But in ascii16-be variant you have {0x414C, 0x4431, 0x3500} - natural byte order.

I understood from earlier comment:

The energy meter SBC ALD1D5F (docs) behaves like this. The string is spanning multiple registers (see registers 7-14 on page 6 in the manual). I assume the device is big-endian, and processing is done per-register internally. The last register (14) contains a single character 0x30 which is 0x0030 in big-endian, so swapping to little-endian gives [0x30, 0x00]

.....that mentioned SBC device returns swapped variant and you have to pass ascii16-be to std.string? Is it a mistake?

git-developer commented 1 year ago

I'm confused, too. The code and the tests don't match. In my first commit, bytes were swapped in the converter when parameter low_first was not set, but the corresponding tests check for unswapped bytes. Something seems to be wrong. Unfortunately, I currently can't make tests with the real energy meter device because I'm on a journey until end of July. I'll check the device behavior when I'm back.

BlackZork commented 1 year ago

I think I now what is happening here. Lets take AB string (0x4142)

string is stored in device in such way that when multiple registers are read, it outputs binary data in modbus frame as 0x41 0x42, so if you decode frame as ASCII you will see string data inside.
libmobbus reads modbus frame data into buffer, then for every single register it uses ntohs(0x4142) to get uint16 in host byte order
If we are on little-endian platform, then ntos produces uint16 value 0x4241 - bytes gets swapped. This value is passed to converter.

String converter wants binary data, so it has to revert the conversion done by libmodbus. It is enough to call htons() on every register. It will swap bytes back on i-386/amd64, but do nothing on other big-endian platforms.

If you could test it and confirm if it works on SBC it would be great. I would like to remove the encoding parameter and relevant tests.

Thanks and have a good journey!

git-developer commented 1 year ago

Hmm... it may well be that htons() does the same as the swapByteOrder() function that I added. I will try and report back. But I don't understand why libmodbus should call ntohs(), my use case isn't even TCP but RTU, so there should be no "network order" involved...

Where do you think should htons() be called in the end: in the string converter or somewhere between libmodbus and the converters?

BlackZork commented 1 year ago

Yes, htons does the same as swapByteOrder, but on little endian platform only. Modbus is big endian (big endian is a standard for network protocols), that's why libmodbus have to convert byte pairs after they are received to little endian (on i386/amd64) before returning uint16 value. For string there is no endianness, so you have to put bytes back in original places (as they were sent in modbus packet)

You could just use htons() in swapByteOrder for max portability.

But.... I also looked at libmodbus source code a little and it looks like it does not support big-endian platform and always swap bytes before returning uint16_t value. I think it is not portable to high-endian platform for now. So it should be enough to remove encoding='none' variant and just leave asci16-be variant (always swap).

If you could confirm it on real device it would be great. If it will work as I think it should we can merge just this variant only.

git-developer commented 1 year ago

FYI: There's a wrapper library for libmodbus called libmodbuspp (LGPL). It takes care of the different byte and register orders (just for the case that there's more endian-related work ahead).

BlackZork commented 1 year ago

Thanks. There are some tools in libmodbus too. Some could be useful for float converter. Looking at this file it turns out floats are stored in every possible byte combination out there :-) An maybe it would be better to use byte_order=ABCD,DCBA instead of 'low_first' as for integers.

git-developer commented 1 year ago

I made a test with the real device.

Direct communication over a serial device

Raw bytes:

Request: 1E 03 00 06 00 08 A6 62 (Slave 30, holding registers 7-15)
Response: 1E 03 10 41 4C 44 31 44 35 46 44 30 30 41 33 41 30 30 00 0B 2A (ASCII: ALD1D5FD00A3A00\u0000)

mqmgateway

Environment:

$ uname -m
x86_64

    - topic: mqmgateway/demo-meter/type
      state:
        register: rtutest.30.7
        count: 8
        converter: std.string(...)

Leaving the bytes untouched
- YAML: converter: std.string()
- MQTT: mqmgateway/demo-meter/type/state "LA1D5DDF003A0A\u00000"
Using ntohs or htons:
- YAML: converter: std.string(ascii16-be)
- MQTT: mqmgateway/demo-meter/type/state "ALD1D5FD00A3A00\u0000"

My thoughts:

Your assumption is confirmed: the device sends characters as ASCII in "MSB first" (big-endian) order, and libmodbus seems to change the order.
Since libmodbus swaps the bytes from network to host order, the reversing function should be htons and not ntohs (although the effect will be the same).
The trailing 0x00 byte looks ugly but seems to be exactly what's on the wire.

I will remove the encoding configuration, change ntohsto htons, add an option to limit the string size and change the docs accordingly.

git-developer commented 1 year ago

Thanks!

BlackZork commented 1 year ago

Thanks for your time and work on this!

FYI I started to work on std.float32 in separate branch.

git-developer commented 1 year ago

Oh, that's nice to hear. I also started an implementation two weeks ago, but then switched to exprtk. I don't need float support in stdconv anymore now, because the exprtk converter is able to read floats, and read-only is sufficient for my use case.

I just pushed the current state of my float branch. If you think it that it could be useful for you, I can open a PR and you may continue working on it. Otherwise I will delete the branch later on.

BlackZork / mqmgateway

Extend stdconv converters: divide for 4-byte numbers, string conversion #21

Direct communication over a serial device

mqmgateway