arjenhiemstra / ithowifi

Itho wifi add-on module (ESP32 wifi to itho I2C protocol)
GNU General Public License v3.0
170 stars 29 forks source link

long integer truncated in status JSON #235

Closed tomkooij closed 4 months ago

tomkooij commented 4 months ago

Describe the bug The WPU 5G (at least firmware 37) has the UTC time in long integer format (unix time ie 1709740263) However in the status JSON and on the status page the UTC time status value is always "1".

I have D_LOG()ed the value and it is correctly read from the status i2c string as a 4 byte integer converted to the correct value and stored into IthoStat.value.intval in: https://github.com/arjenhiemstra/ithowifi/blob/master/software/NRG_itho_wifi/main/IthoSystem.cpp#L831

It appears to be truncated by ArduinoJSON when converting to JSON. The value is wrong in the MQTT status JSON. (value = '1')

To Reproduce Steps to reproduce the behaviour:

  1. Go to status page on WPU 5G
  2. The UTC-time value is always 1
  3. In the i2c string (2401 on debug page) the correct UTC unix time is visible.

Expected behaviour The value UTC time should be the number of seconds since 1970, not 1.

Device information

Debug logging Please provide debug logging from the debug page if possible:

Below is the output of:

 if (tempVal > 1000000)
              { D_LOG("time: %d", ithoStat.value.intval);}

https://github.com/arjenhiemstra/ithowifi/blob/master/software/NRG_itho_wifi/main/IthoSystem.cpp#L831

Mar  6 15:43:09 nrg-itho-c2a8 nrgitho time: 1709740160
Mar  6 15:43:20 nrg-itho-c2a8 nrgitho time: 1709740173
Mar  6 15:43:50 nrg-itho-c2a8 nrgitho time: 1709740203
Mar  6 15:44:20 nrg-itho-c2a8 nrgitho time: 1709740233
Mar  6 15:44:50 nrg-itho-c2a8 nrgitho time: 1709740263
Mar  6 15:45:20 nrg-itho-c2a8 nrgitho time: 1709740293
Mar  6 15:45:50 nrg-itho-c2a8 nrgitho time: 1709740323

(The internal clock of my WPU is above 5 minutes in front of UTC time)

I'm convinced the IthoStatus object has the correct value. It must be truncated by ArduinoJSON I have tried to enable ARDUINOJSON_USE_LONG_LONG but that should be already enabled on ESP32. It made no difference. (EDIT: These are 32 bits unsigned integers, so long long should not matter)

tomkooij commented 4 months ago

@arjenhiemstra I just found some code which casts ints to int32_t. (git blame points to me)

EDIT: at least IthoDeviceStatus::int_val is int32_t https://github.com/arjenhiemstra/ithowifi/blob/master/software/NRG_itho_wifi/main/IthoSystem.h#L54

It should be an easy fix. Will look into it soon.

tomkooij commented 4 months ago

Doesn't seem an easy fix... :(

When I change IthoDeviceStatus::int_val to int64_t or uint32_t the value of UTC in both status html page and MQTT status JSON is still "1". When I D_LOG() the value to syslog the correct value is shown. The value is correctly read from the status i2c message and correctly stored in IthoStatus.

It seems to be problem with serialising the JSON. But I cannot find what is going wrong. (This is a 4 byte uint32, not even a 64bit long)

TD-er commented 4 months ago

Please be aware that there isn't a standard String function for 64-bit ints. So maybe the compiler casts it it to a bool.

You could simply test by casting it to uint32_t to see if this will work out.

arjenhiemstra commented 4 months ago

String is not used in this case. The (u)int32_t value is a possible value of a union which is part of an own data struct:

https://github.com/arjenhiemstra/ithowifi/blob/3408028bafca6ec840d1bee8a5dda48b5319a5be/software/NRG_itho_wifi/main/IthoSystem.h#L51-L58

The is fed to ArduinoJSON directly. Which should support 32bit signed/unsigned out of the box.

I read that the consensus is for time_t to be signed 32bit. Is that also the way it is processed from the input i2c 2401 result?

Could you post the i2c results from the debug page?

TD-er commented 4 months ago

Hmm using a union for types of different size is not really defined. So be very very careful when using these kinds of union types.

arjenhiemstra commented 4 months ago

according to: https://en.cppreference.com/w/cpp/language/union

"The union is at least as big as necessary to hold its largest data member"

which in this case should be 8 bytes for the double on esp32. All other possible values should fit in these 8 bytes. I do not see/understand where the undefined behaviour arrises from.

TD-er commented 4 months ago

You don't know where the members will be located. And since neither of the members is initialized, the values can be literally anything, even if you initialized a single one of them.

tomkooij commented 4 months ago

OMG this is so stupid: https://github.com/arjenhiemstra/ithowifi/blob/master/software/NRG_itho_wifi/main/devices/wpu.h#L588 (keep in mind these things are autogenerated)

There is are two values utc-time. So the correct one is just overwritten by the other (boolean) one...

It's fixed now. I'll submit a PR. But need to cook diner now ;-) image

(In the PR i'll also fix the int32_t / uint32_t thing.

arjenhiemstra commented 4 months ago

You don't know where the members will be located. And since neither of the members is initialized, the values can be literally anything, even if you initialized a single one of them.

I still do not get it (and I really would like to understand it and I know that you are very knowledgable so I want to take it seriously)

According to the same union documentation as mentioned earlier: "all non-static data members have the same address"

Which would mean (imo) that the start address of the data we are interested in (being a byte, double, uint etc) would all start at the same union address. It is then up the the programmer to make sure to interpreted the bytes located at that starting address until the length of the type (1 to 8 bytes in this case) correctly and fitting for the data type expected.

In code first the data type is set in a separate value in the struct here:

https://github.com/arjenhiemstra/ithowifi/blob/3408028bafca6ec840d1bee8a5dda48b5319a5be/software/NRG_itho_wifi/main/IthoSystem.h#L43-L49

So each instance of the struct: https://github.com/arjenhiemstra/ithowifi/blob/3408028bafca6ec840d1bee8a5dda48b5319a5be/software/NRG_itho_wifi/main/IthoSystem.h#L40-L63

has at least a data type set (that happens first before the data at the union's mem location is being handled, I'm fully aware it does not happen on struct initialisation) for the union in that instance of the struct. These instances are then emplaced on a vector and kept in memory.

This set datatype determines how the rest of the code should interpreted the data at union mem &x. The bytes are read from the start address until the length of the data type (1 to 8 bytes) and then interpreted as instructed by the set data type (being a byte, double, uint etc).

The only undefined behaviour I can think of is when the data type is set but the actual data has not been written to the union. The data in mem (which could be anything atm) would be interpreted as instructed but especially for a const char * this could lead to a crash I can imagine. A byte, (u)int would probably only lead to a wrong value, I'm not sure how a double would be handled, have not looked into that to be honest.

I love to hear your thoughts about it!

arjenhiemstra commented 4 months ago

and of course the struct could be changed to something like this:

struct ithoDeviceStatus
{
  const char *name;
  enum : uint8_t
  {
    is_byte,
    is_int,
    is_float,
    is_string
  } type;
  uint8_t length;
  union
  {
    byte byteval;
    int32_t intval;
    uint32_t uintval;
    double floatval;
    const char *stringval;
  } value{.byteval = 0};
  uint32_t divider;
  uint8_t updated;
  bool is_signed;
  ithoDeviceStatus() : name(nullptr), type(is_byte), length(1), divider(1), updated(0), is_signed(false){};
};

to make 100% sure that at creating instance all occupied memory has a known value (expect for the 7 padding bytes in the union). But I think that this will still not circumvent the possible issue of setting the type to is_string and then not writing the correct bytes to the union's mem location and trying to let the code interpret it as a cons char * for example.

TD-er commented 4 months ago

Let's take that text you quoted from:

The union is at least as big as necessary to hold its largest data member, but is usually not larger. The other data members are intended to be allocated in the same bytes as part of that largest member. The details of that allocation are implementation-defined, except that all non-static data members have the same address. It is undefined behavior to read from the member of the union that wasn't most recently written. Many compilers implement, as a non-standard language extension, the ability to read inactive members of a union.

The most important caveats in this alinea are marked as bold. I'm not sure if the "same address" part is always enforceable, but I guess it will probably be with this example struct.

I don't know what will happen if you make a union using bitwise members (struct { uint8 bla1:4; uint8 bla2:4; } bla; vs uint16_t bla:8; for example)

The "most recently written" part is probably not an issue in any implementation upto 32 bit (on a 32-bit CPU), but I'm not sure how it may work on larger members of the union.

And if you like to initialize the struct, I would set the pointer to null_ptr as that will also set the byteval to 0.

arjenhiemstra commented 4 months ago

this: struct { uint8 bla1:4; uint8 bla2:4; } bla;

is a struct with two distinct memory locations for 2 uint8 values.

whereas in my example this would translate to something like: struct { uint8_t type, union { uint8 bla1:4; uint8 bla2:4; } uni } bla;

here we would still have two distinct memory locations but in that case one unit8 for type and one unit8 for the value stored in union uni (which is this example is both an uint8, a bit unusual application of a union of course)

it is the sentence after the part you made bold that sets me in the direction I used "except that all non-static data members have the same address". The union in my case consists of non-static data members, if I understand correctly.

And combined with what is in ISO/IEC 9899: " The size of a union is sufficient to contain the largest of its members. The value of at most one of the members can be stored in a union object at any time. A pointer to a union object, suitably converted, points to each of its members "

this should result in behaviour like I described earlier (at least I certainly hope so)

And isn't "undefined behavior" [...] "most recently written" meant to be the explanation of a situation where you for example last wrote data to union member uintval and try to access member ie. intval thereafer?

edit; to elaborate a bit more on the 32/64 bit CPU, I think this should not matter.

for example:

union {
uint8_t a,
int32_t b,
int64_t c,
char string[24]
} value;

value should occupy a minimum of 24 bytes of memory space for each instantiation of the union, in-depended of CPU arch. and for any instance of the union &a == &b == &c == &string[0].

TD-er commented 4 months ago

this: struct { uint8 bla1:4; uint8 bla2:4; } bla;

is a struct with two distinct memory locations for 2 uint8 values.

Nope, the sizeof of that struct should be 1 byte. The funny thing is you can even mix signed and unsigned here, thus putting the bit for the minus sign at an unusual position.

struct { uint8 bla1:4; int8 bla2:4; } bla;

This has two individually accessible 4-bit variables so you don't need to perform bitmasks to get data from them or set it. And bla2 is signed so you can set values from -8 ... 7 to it.

whereas in my example this would translate to something like: struct { uint8_t type, union { uint8 bla1:4; uint8 bla2:4; } uni } bla;

Nope, the opposite:

union{ uint8_t type, struct { uint8 bla1:4; uint8 bla2:4; } uni } bla;

Otherwise the bla1 and bla2 would share the same memory bits.

here we would still have two distinct memory locations but in that case one unit8 for type and one unit8 for the value stored in union uni (which is this example is both an uint8, a bit unusual application of a union of course)

You can create all kinds of bitwise structs. Just be aware of using signed vs. unsigned as that has bitten me way too often :)

See for example this union I use in my WiFi code:

  union 
  {
    struct {
      uint16_t isHidden:1; // Hidden SSID
      uint16_t lowPriority:1; // Try as last attempt
      uint16_t isEmergencyFallback:1; 
      uint16_t phy_11b:1; 
      uint16_t phy_11g:1; 
      uint16_t phy_11n:1; 
      uint16_t phy_lr:1; 
      uint16_t phy_11ax:1; 
      uint16_t wps:1; 
      uint16_t ftm_responder:1; 
      uint16_t ftm_initiator:1; 

      uint16_t unused:5;      
    };
    uint16_t flags{};
  };

But it is perfectly fine to user more bits to store some enum value for example.

Just be aware that the order of the bits may feel a bit counter-intuitive, so always test/check when you try to implement some struct to match some register definition from a datasheet.

it is the sentence after the part you made bold that sets me in the direction I used "except that all non-static data members have the same address". The union in my case consists of non-static data members, if I understand correctly.

And combined with what is in ISO/IEC 9899: " The size of a union is sufficient to contain the largest of its members. The value of at most one of the members can be stored in a union object at any time. A pointer to a union object, suitably converted, points to each of its members "

Yep the sizeof the union struct is just what you'd expect, 8 bytes. It can sometimes be padded if the largest element of the union isn't a multiple of 4 bytes, but that's compiler dependent.

Also most compilers (all???) will complain if you ever use a union type in a packed struct, so it is unlikely you will end up with unaligned members like an uint32_t starting at an address which is not a multiple of 4.

this should result in behaviour like I described earlier (at least I certainly hope so)

And isn't "undefined behavior" [...] "most recently written" meant to be the explanation of a situation where you for example last wrote data to union member uintval and try to access member ie. intval thereafer?

Most 32-bit CPUs will load/store per 32 bit. So any update to union members of at most 32 bit will very likely update all bits and even if it isn't updated yet at that memory location but still in the cache or in some register, those will work as expected when you write to one and read the other. However if you exceed this 32-bit boundary, you may see odd behavior.

edit; to elaborate a bit more on the 32/64 bit CPU, I think this should not matter.

for example:

union {
uint8_t a;
int32_t b;
int64_t c;
char string[24];
} value;

value should occupy a minimum of 24 bytes of memory space for each instantiation of the union, in-depended of CPU arch. and for any instance of the union &a == &b == &c == &string[0].

Yep, all true, but now consider this:

int64_t test() {
 union {
  uint8_t a;
  int32_t b;
  int64_t c;
  char string[24];
 } value;

 memset(&value, 0, sizeof(value)); // May be optimized out by the compiler, can lead to very tricky situations and bugs!

 value.a = 123;
 value.b = value.a; // Just to make sure the compiler actually does something with the assignment
 for (size_t i = 1; i < sizeof(value.string); ++i) { value.string[i] = 0; } // Make sure the compiler doesn't optimize out all other assignments but still needs to reload some registers
 value.a = 123;
 value.c = 124;

 return value.a;
}

Strictly speaking it is compiler dependent what will be returned. One would expect this function to return 0, but given it is left to the compiler implementation, it is possible the function will return 123. Also the optimization steps done by the compiler may remove all assignments except for the last one. Or the compiler may consider it to be the same address and actually returning value.c (thus returning 124) without performing a cast. But that would be a compiler bug IMHO.

TD-er commented 4 months ago

And just to illustrate why it might be tricky using unions..... I think exactly the union I posted in my code is causing issues as I try to clear all bits in the constructor by setting the flags to 0. But the separate bits still appear not to be set. Only when I explicitly set the bits myself, they are set.

So in my own code, I still need to explicitly set the flags myself:

WiFi_AP_Candidate::WiFi_AP_Candidate(uint8_t networkItem) : index(0), flags(0) {
  // Need to make sure the phy isn't known as we can't get this information from the AP
  // See: https://github.com/letscontrolit/ESPEasy/issues/4996
  // Not sure why this makes any difference as the flags should already have been set to 0.
  phy_11b = false;
  phy_11g = false;
  phy_11n = false;
  wps     = false;
[...]

As I was already looking into this myself, I was focussed on issues with unions, so be warned :)

arjenhiemstra commented 4 months ago
  union 
  {
    struct {
      uint16_t isHidden:1; // Hidden SSID
      uint16_t lowPriority:1; // Try as last attempt
      uint16_t isEmergencyFallback:1; 
      uint16_t phy_11b:1; 
      uint16_t phy_11g:1; 
      uint16_t phy_11n:1; 
      uint16_t phy_lr:1; 
      uint16_t phy_11ax:1; 
      uint16_t wps:1; 
      uint16_t ftm_responder:1; 
      uint16_t ftm_initiator:1; 

      uint16_t unused:5;      
    };
    uint16_t flags{};
  };

my point is: your struct in a union from above example is something else than a union in a struct (my example / the piece of code from this project). You are accessing bits (missed that in my earlier reaction, sorry), I'm trying to get whole types from the union.

as per this part:

`int64_t test() { union { uint8_t a; int32_t b; int64_t c; char string[24]; } value;

memset(&value, 0, sizeof(value)); // May be optimized out by the compiler, can lead to very tricky situations and bugs!

value.a = 123; value.b = value.a; // Just to make sure the compiler actually does something with the assignment for (size_t i = 1; i < sizeof(value.string); ++i) { value.string[i] = 0; } // Make sure the compiler doesn't optimize out all other assignments but still needs to reload some registers value.a = 123; value.c = 124;

return value.a; }`

This example quite clearly explains what I'm trying to convey and as I understand documentation this should happen:

value.c = 124; would set the active member of the union value to member c with a value of 124, as it is a int64_t the binary representation of value.a in mem should then be: 00000000000000000000000001111100

if you (illegally) access the non active union member value a (return value.a;) most (?) compilers would interpretet the data at mem address of a as uint8_t, which is 01111100 which is 124 in decimal. And thus return the value 124.

I've just done a quick test with GCC and Clang, both return the result as described above. image

TD-er commented 4 months ago

It depends on the endianness of the platform where the bits will be set.

And as I mentioned in my previous reply it really makes a difference when you clear all bytes via one member and try to read from another. As I literally now have this bug which occurs on ESP8266 with SDK2.7.4 where I set the bits all to 0 in the constructor and then not set those bits explicitly again. When accessing the individual bits, they are not cleared.

The funky part is that I also do something very similar in another union which totals as 32 bit (and is guaranteed to be 32-bit aligned) and this does work just fine for years.

arjenhiemstra commented 4 months ago

Very interesting investigation, thanks!

It depends on the endianness of the platform where the bits will be set.

sure, just left that out to keep things simple and because I think we both understand how much that will be of influence within a particular system architecture.

And as I mentioned in my previous reply it really makes a difference when you clear all bytes via one member and try to read from another.

According to doc mentioned earlier: "It is undefined behavior to read from the member of the union that wasn't most recently written."

So OK, I think the 'puzzlement' here drills down to the difference in bit-field based members of a union or struct (with defined bit width) in your examples and the, type based, non-static union members in my code. According to the above statement only reading data from the most recent written member is defined behaviour, reading from any other member is undefined. But that might not be the case for bit-field based structures, cannot find a clear definition of that in the C docs. Also: "Each non-bit-field member of a structure or union object is aligned in an implementation defined manner appropriate to its type."

the code I initially shared:

https://github.com/arjenhiemstra/ithowifi/blob/3408028bafca6ec840d1bee8a5dda48b5319a5be/software/NRG_itho_wifi/main/IthoSystem.h#L51-L58

adheres to this and is also very much similar in setup and use as the example here: https://en.cppreference.com/w/cpp/language/union

#include <iostream>

// S has one non-static data member (tag), three enumerator members (CHAR, INT, DOUBLE), 
// and three variant members (c, i, d)
struct S
{
    enum{CHAR, INT, DOUBLE} tag;
    union
    {
        char c;
        int i;
        double d;
    };
};

void print_s(const S& s)
{
    switch(s.tag)
    {
        case S::CHAR: [std::cout](http://en.cppreference.com/w/cpp/io/cout) << s.c << '\n'; break;
        case S::INT: [std::cout](http://en.cppreference.com/w/cpp/io/cout) << s.i << '\n'; break;
        case S::DOUBLE: [std::cout](http://en.cppreference.com/w/cpp/io/cout) << s.d << '\n'; break;
    }
}

int main()
{
    S s = {S::CHAR, 'a'};
    print_s(s);
    s.tag = S::INT;
    s.i = 123;
    print_s(s);
}

So I think we can (counting on good compiler implementation surely) conclude that this statement:

Hmm using a union for types of different size is not really defined.

Seems not applicable in the case of my example first given where the union members are non-static data types without bit-fields?

TD-er commented 4 months ago

Well, let's rephrase it to: "Pay very close attention when using union" And now I will continue fixing the bugs in my own code due to use of union.... ;) (at least on ESP8266 it doesn't behave as expected and it seems to differ between Linux en Windows builds)

arjenhiemstra commented 4 months ago

Well, let's rephrase it to: "Pay very close attention when using union"

👍 As we should always do!

And now I will continue fixing the bugs in my own code due to use of union.... ;) (at least on ESP8266 it doesn't behave as expected and it seems to differ between Linux en Windows builds)

Hats off and good luck! Hope to see you soon again @ De Maakplek!