fredlcore / BSB-LAN

LAN/WiFi interface for Boiler-System-Bus (BSB) and Local Process Bus (LPB) and Punkt-zu-Punkt Schnittstelle (PPS) with a Siemens® controller used by Elco®, Brötje® and similar heating systems
222 stars 84 forks source link

[BUG] Datalog contains values with date 00.01.1900 #530

Closed CZvacko closed 1 year ago

CZvacko commented 1 year ago

BSB-LAN Version happen in 3.0.3-20221119003550, was ok in 2.2

Architecture ESP32

Bus system BSB

Describe the bug Datalog contains values with date 00.01.1900 in random occurrences. Approximately 0.1% of the rows are wrong. This also causes a problem when plotting the graph (shows redundant horizontal lines).

To Reproduce Steps to reproduce the behavior:

  1. Go to setting
  2. Click on Logging mode and enable Write to SD card
  3. Collect some data...

Logs Fragments of datalog: 447965602;30.11.2022 21:40:00;8770;TO2 teplota prostoru;21.7;°C 447966023;00.01.1900 55105:78;8830;TV Skutečná teplota B3;45.5;°C 448116251;30.11.2022 21:42:30;8325;Aktuální řízení ventilátoru;0.0;%

517293611;01.12.2022 16:55:28;8325;Aktuální řízení ventilátoru;21.0;% 517293820;00.01.1900 46221:87;8326;Modulace hořáku;17;% 517294038;01.12.2022 16:55:28;8700;Venkovní teplota;0.7;°C

Expected behavior Real date is logged

Wrong date Wrong date.csv

fredlcore commented 1 year ago

Oh, that's strange... @dukess, would it make sense to check if GetDateTime returns zero and then repeat that function call? Because even if the clock is not set properly to current date, it shouldn't return 0, I guess?

dukess commented 1 year ago

May buggy version is used: 00.01.1900 should be 01.01.1900 at least

fredlcore @.***> 3 декабря 2022 г. 10:30:51 написал:

Oh, that's strange... @dukess, would it make sense to check if GetDateTime returns zero and then repeat that function call? Because even if the clock is not set properly to current date, it shouldn't return 0, I guess?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.

fredlcore commented 1 year ago

With buggy version you mean older version of BSB-LAN?

dukess commented 1 year ago

Yep. With issue 527 https://github.com/fredlcore/BSB-LAN/issues/527

fredlcore @.***> 3 декабря 2022 г. 12:34:55 написал:

With buggy version you mean older version of BSB-LAN? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

CZvacko commented 1 year ago

Yep. With issue 527 #527 fredlcore @.> 3 декабря 2022 г. 12:34:55 написал: With buggy version you mean older version of BSB-LAN? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.>

In my local version I edited esp32_time.h and removed the "- 1" on line 9 with "t.tm_mday = day". Even before issue 527 was fixed officially.

fredlcore commented 1 year ago

Could you nevertheless download the most recent version from the master repository and try it again? There may be other issues that we fixed since then that may have an impact on it, so just to be sure...

CZvacko commented 1 year ago

Could you nevertheless download the most recent version from the master repository and try it again? There may be other issues that we fixed since then that may have an impact on it, so just to be sure...

Today morning I downloaded the most recent version, flashed, then issue D0 command. Now after 12 hours (273 data readings), I have 2 rows with bad date (and graph corrupted).

fredlcore commented 1 year ago

Ok, thanks for testing! Could you change the line

    getLocalTime(&now,0);

in BSB_LAN.ino into

  do {
    getLocalTime(&now,0);
  } while (now.tm_year < 2);

and test if the problem remains?

@dukess: I have no clue why this function is coded as it is:

bool getLocalTime(struct tm * info, uint32_t ms)
{
    uint32_t start = millis();
    time_t now;
    while((millis()-start) <= ms) {
        time(&now);
        localtime_r(&now, info);
        if(info->tm_year > (2016 - 1900)){
            return true;
        }
        delay(10);
    }
    return false;
}

Why is there a while loop that repeatedly tries to calculate the current date/time? For some reason it seems likely that the date calculation goes wrong, at least that would explain the check for tm_year being larger than (2016-1900=)116. However, if it is likely to fail due to certain circumstances, it may be better to set the timeout (that this function actually provides for) not to 0 but to something like 10 or 100 or so.

So while thinking about it, @CZvacko, maybe you could first start by changing

getLocalTime(&now,0);

to

getLocalTime(&now,100);

and see if that does the trick.

dukess commented 1 year ago

Same thing: it is unclear behavior for me. But we have no other way now except adding delay to getLocalTime()

Unfortunately i can't research it until weekend.

fredlcore @.***> 4 декабря 2022 г. 21:49:16 написал:

Ok, thanks for testing! Could you change the line

getLocalTime(&now,0);

in BSB_LAN.ino into

do { getLocalTime(&now,0); } while (now.tm_year < 2);

and test if the problem remains?

@dukess: I have no clue why this function is coded as it is:

bool getLocalTime(struct tm * info, uint32_t ms) { uint32_t start = millis(); time_t now; while((millis()-start) <= ms) { time(&now); localtime_r(&now, info); if(info->tm_year > (2016 - 1900)){ return true; } delay(10); } return false; }

Why is there a while loop that repeatedly tries to calculate the current date/time? For some reason it seems likely that the date calculation goes wrong, at least that would explain the check for tm_year being larger than (2016-1900=)116. However, if it is likely to fail due to certain circumstances, it may be better to set the timeout (that this function actually provides for) not to 0 but to something like 10 or 100 or so.

So while thinking about it, @CZvacko, maybe you could first start by changing

getLocalTime(&now,0);

to

getLocalTime(&now,100);

and see if that does the trick.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.

DE-cr commented 1 year ago

Reminds me of bogus values I've received from a NTP server every now and then, when synchronizing the time in one of my sketches - which the NTPclient library I've used sadly doesn't handle. My workaround was to do a couple of retries in those cases, similar to what is suggested here. I'm not sure whether the cause of this in my case was a weak network connection (UDP over WLAN), or what else it was. ...and I must confess that I don't even know whether BSB-LAN in this setup uses NTP or RTC.

fredlcore commented 1 year ago

No, we use neither. The time is taken from the heater or set manually. The library just keeps track of the time and apparently doesn't always return reliable values. That's why @CZvacko needs to try my workaround.

dukess commented 1 year ago

@DE-cr The description does not say that the GetLocalTime() function implicitly calls or waits for NTP. But it looks exactly like this. Another reason is race condition. I want to try to rewrite the current date query using the gettimeofday() function, but I have no confidence that it will get better.

On my system, this problem does not show itself.

DE-cr commented 1 year ago
  1. getLocalTime() returns a bool value to signal success - which BSB_LAN.ino does not evaluate.
  2. This may (or may not) help: https://forum.arduino.cc/t/esp32-time-library-sync-interval/902588
fredlcore commented 1 year ago

Yes, getLocalTime() returns a status, but BSB-LAN does not evaluate it because we're not making use of NTP - and none of the code in the above-quoted function should. We set the time either based on the time of the heater or it's 1.1.1900. But since the date is set correctly in the logs at first and the error only occurs after hours or days, it doesn't make sense unless the ESP32 calls an NTP server unknown to me. That should not happen for various reasons or should at least be configurable.

CZvacko commented 1 year ago

I just flashed device with "getLocalTime(&now,100)" Now need wait few hours/days to see if happen again.

fredlcore commented 1 year ago

Thanks!

fredlcore commented 1 year ago

Any update, @CZvacko?

CZvacko commented 1 year ago

Yes, it's fine now. I waited a few days to be sure.

fredlcore commented 1 year ago

Thanks, goof to know! We'll change it according to your suggestion.