arduino / ArduinoCore-API

Hardware independent layer of the Arduino cores defining the official API
https://www.arduino.cc/reference/en/
GNU Lesser General Public License v2.1
215 stars 119 forks source link

Rewrite IPAddress without union #225

Open safocl opened 1 year ago

safocl commented 1 year ago

Rewrite IPAddress without union

Using a union allowed undefined behavior for 8-bit integer INPUT and OUTPUT arguments through a 32-bit integer (and vice versa). Write through union::uint8_t[16] and read through union::uint32_t[4] is an undefined behavior.

C++ standard says (https://timsong-cpp.github.io/cppwp/n4659/class.union#1):

At most one of the non-static data members of an object of union type can be active at any time, that is, the value of at most one of the non-static data members can be stored in a union at any time. [ Note: One special guarantee is made in order to simplify the use of unions: If a standard-layout union contains several standard-layout structs that share a common initial sequence, and if a non-static data member of an object of this standard-layout union type is active and is one of the standard-layout structs, it is permitted to inspect the common initial sequence of any of the standard-layout struct members; see [class.mem].  — end note ]"

and (https://timsong-cpp.github.io/cppwp/n4659/class.mem#22):

Two standard-layout unions are layout-compatible if they have the same number of non-static data members and corresponding non-static data members (in any order) have layout-compatible types.

you can't hope that compilers don't seem to do undefined behavior at the moment

Also created https://github.com/espressif/arduino-esp32/pull/8829 PR dependent on changing the current API.

CLAassistant commented 1 year ago

CLA assistant check
All committers have signed the CLA.

codecov-commenter commented 12 months ago

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (82a5055) 95.53% compared to head (d85523f) 95.42%.

Files Patch % Lines
api/IPAddress.cpp 95.55% 2 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #225 +/- ## ========================================== - Coverage 95.53% 95.42% -0.11% ========================================== Files 16 16 Lines 1075 1050 -25 ========================================== - Hits 1027 1002 -25 Misses 48 48 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

aentinger commented 11 months ago

I have problems with all those heavy changes in this PR, most of which seem to be motivated by using C++ standard library containers and patterns.

Don't you think the problem can be addressed by enforcing a packed union, i.e. something like this

    union {
        uint8_t bytes[16];
        uint32_t dword[4];
    } _address __attribute__((packed, aligned(4)));

Although I have not invented this particular instance I may add that I have used (and use) this pattern repeatedly over the years, most recently here and here. I have yet to experience undefined behaviour.

TL;DR: No to heavy changes including introduction of C++ STL types, lighter tough solutions are preferred.

safocl commented 11 months ago

I have problems with all those heavy changes in this PR, most of which seem to be motivated by using C++ standard library containers and patterns.

Don't you think the problem can be addressed by enforcing a packed union, i.e. something like this

    union {
        uint8_t bytes[16];
        uint32_t dword[4];
    } _address __attribute__((packed, aligned(4)));

Although I have not invented this particular instance I may add that I have used (and use) this pattern repeatedly over the years, most recently here and here. I have yet to experience undefined behaviour.

TL;DR: No to heavy changes including introduction of C++ STL types, lighter tough solutions are preferred.

This trouble with using of union is a undefined behavior when value was written as uint8_t[16] and read as uint32_t[4] -- it are not compatible layouts. But struct { int x, y; } and int[2] is layout-compatible -- it can already be used in conjunction with each other.

std::array<> used in this PR for STL-style only. it doesn't make any problems. Using of the c-style arrays are bad practice.

safocl commented 11 months ago

this functionality can be implemented through a C-style array

sgryphon commented 11 months ago

Thanks for changing back to C-style array -- that is a separate change (from just removing the union).

I think removing the union makes a lot of sense, as it added very little benefit.

In practice it probably has the same result (even if accessing through alternate union members has unspecified behaviour, I think in most cases they would end up in the overlapping memory and just work). But then in practice, I don't know how often the 32-bit word variations are actually used.

An IPv6 would rarely be handled as 32-bit words... either bytes or 16-bit chunks (how it is displayed) makes sense.

For an IPv4 address there actually is a defined representation as a 32-bit integer, e.g. http://2398766798 is the same as http://142.250.70.206, and that may have been what the original intention was.

However on little endian systems the octets 0x83, 0xFA, 0x46, 0xCE, in that order, will form the integer (uint32_t) 3460758158, so while each 4-byte part of the array can be accessed as a 32-bit value, the result is platform byte-order dependent, and may not necessarily match with the usage of integers representations of IPv4 address.

It would be interesting to know what the usage scenario is for accessing as 32-bit words, and whether that whole section could be deprecated and eventually removed.

In the meanwhile, removing the union seems good.

safocl commented 11 months ago

In practice it probably has the same result (even if accessing through alternate union members has unspecified behavior, I think in most cases they would end up in the overlapping memory and just work).

It is undefined behavior, not unspecified. Any compiler can at any time assume that there is no undefined behavior and compile the program in such a way that a certain number of instructions (code) have been optimized to eliminate undefined behavior. That is, literally anything can happen - including all logic being cut out, since only in this case will it be possible to achieve the absence of undefined behavior - which the compiler will happily do.