fredlcore / BSB-LAN

LAN/WiFi interface for Boiler-System-Bus (BSB) and Local Process Bus (LPB) and Punkt-zu-Punkt Schnittstelle (PPS) with a Siemens® controller used by Elco®, Brötje® and similar heating systems
222 stars 84 forks source link

[FEATURE REQUEST] Modifying the graph to allow the long-term data logging #539

Closed CZvacko closed 1 year ago

CZvacko commented 1 year ago

If the user collects values over a long period of time (weeks, months) and with a short interval (150s), then working with the chart is quite slow. Any zooming or other operation takes a long time, for example for data collected for 2 weeks (50000 rows).

Maybe if the application could split the data by weeks (and allow to select a week using a drop-down list when displaying), it would be better and could collect data even for a whole year.

Currently I have to reset the datalog every 2 weeks.

fredlcore commented 1 year ago

Hm, I was 100% sure that we had an open ticket for this request already, but I can't find it right now, so I'll leave this open until I find the other one. Otherwise, we'll continue here. In short: Yes, that would be a very handy feature, but since there is no fully working SQL(ite) or other kind of database available on the Due/ESP32, it's difficult to implement this. If someone has ideas how to effectively purge data older than a certain timeframe, I'm happy to hear more about it.

CZvacko commented 1 year ago

The values are currently read from a file, not from memory, right? My idea was to write a separate file for each week, e.g. datalog_w01.txt, datalog_w02.txt. Then just plot the desired file (datalog_w01.txt ...), no need to have a dymamic selection for the date range. Maybe some users will prefer to keep the current way, then some setting could be added to allow the user to choose if they want a single log file or separated by weeks. Also, there should be an option to download all log files at once.

fredlcore commented 1 year ago

Yes, they are read from a file. Separating the files in weeks is not too feasible, I think, because then you can't see for example what happens from Sunday over to Monday in one graph. Usually time intervals are more intuitive and could easily be adjusted to the users' needs (7 days, 1 month etc.). The problem is the "maintenance work" that would have to be done at a given time where old log entries are discarded. In SQL(ite3) that's really easy and fast, but there is no full implementation for microcontrollers yet.

CZvacko commented 1 year ago

Yes, sql would be ideal, but still better separate weekly logs than nothing. Concerning logging over the next week,, if someone will expect it in advance, then he can switch the (proposed) setting to use continuous logging.

DE-cr commented 1 year ago

The JavaScript libraries used for /DG displays (C3+D3) are indeed slow when dealing with larger datasets. The older /DG implementation (using D3 only) might be faster. You can use it instead of the C3+D3 version by preceding #define USE_ADVANCED_PLOT_LOG_FILE in BSB_LAN_config.h with //.

One easy (?) hack with the C3+D3 version might be to filter the /D data in its JavaScript right after reading, e.g. by applying a regular expression passed in with the url, e.g. /DG?DateFilterRegEx=1.\.02\.2023 or something like that for 10.-19. Feb 2023 only. However, this seems hardly user friendly. Providing a more usable way to specify the time frame for filtering is conceivable, but would require more work.

My personal solution to this problem has been to keep log intervals long and/or log periods short. :)

fredlcore commented 1 year ago

Hm, the filter is actually quite a nice idea, because if we use dropdowns or whatever for setting the timeframe, the corresponding regexes could be created from that setting. However, it would still require to always read the complete dataset from the SD-card...

DE-cr commented 1 year ago

Yes, but that size penalty would hurt only once, not for working with the graph (zoom, pan, ...), and also not for its initial display.

Interactive filter settings could take different forms, e.g. show only the most recent n lines, show only data between dates x and y (possibly with y=now as a default), using every tenth value only, ... Since /D shows dates as dd.mm.yyyy and not yyyy-mm-dd, deciding if a date lies between x and y isn't as simple as it could be; for that, I'd (suggest to) transform the current format into the latter one before doing the actual filtering.

Do you know if there's enough people using /DG to display large datasets to warrant the implementation effort? So far I've thought that /DG is not meant to replace a proper FHEM system's graphing capabilities.

fredlcore commented 1 year ago

Of course it would hurt for its initial display because you would still have to load the complete data set even though if you only want to display the data for last week. That's where a SQLite implementation would significantly speed things up.

As for FHEM: If you have ever worked with FHEM, you'll know that its graphing capabilities is one of its worst qualities. FHEM's strength lies in fine-grained controlling of functions which is something that BSB-LAN cannot reproduce. The functionality of /DG as it is now is not really usable unless the user takes regular action (i.e. regularly deleting/moving data). Having a function that is half-baked does not make sense, so we either have to improve it or remove it. And yes, there are quite a few people using it.

DE-cr commented 1 year ago

That's a bit like saying: all cars that cannot carry a ton of bricks in their trunk doing 200 km/h don't make sense, or all cars that need to be refueled are not really usable. I consider the current /DG implementation very useful, and much more versatile than cars, to stick to that example: It gives you both detailed (speed) and long period (load capacity) graphs - and if you're patient or willing to clear the data (refuel) every once in a while, even both at the same time. ;)

I'll take a lot at extending the current /DG implementation to interactively filter data at the client side, if that's desired. Doing the filtering on the server side (SQLite, ...) or providing a completely different /DG implementation is something I don't see in my capacity.

DE-cr commented 1 year ago

If/when filtering on the client side is done: The data in /D is highly redundant and would compress very well, to reduce transmission time. Of course, server side filtering would bei better, though.

fredlcore commented 1 year ago

The older I get, the less I think that analogies are helpful in any way because people tend to discuss on the details of the analogy more than they should do on the actual matter. I have found implementations of SQLite that could work with BSB-LAN, but it's written for the IDF environment, so I guess it needs patching to work with the Arduino IDE which we use in this project: https://github.com/siara-cc/esp32-idf-sqlite3/tree/c34beba7318fc00953b33db215b10dd23cb31924

DE-cr commented 1 year ago

Sqlite3.c is 7.36 MB. Sounds pretty big to me for an arduino/esp32 library used for what, to my understanding, is not bsb-lan's core functionality.

fredlcore commented 1 year ago

You know that the size of the source code (including in this case hundreds of lines of comments) says nothing about the size of the compiled binary?

DE-cr commented 1 year ago

I've seen enough code to know that, as a general rule, there is some correlation between source code size and binary code size. And I've seen enough code to be skeptical about fitting SQLite3 functionality in just a few KB. The size given here sounds more reasonable to me - or at least that order of magnitude.

The current /DG implementation accounts for less than 2 KB of bsb-lan's binary code, iirc.

fredlcore commented 1 year ago

So you seriously compare a full-blown SQLite installation on a regular PC with an adaptation with reduced functionality specifically designed for a microcontroller?

If you want me to take your argument seriously, provide relevant, fact-based information, for example after compiling one of the examples and see how much memory it consumes. This is what I will eventually do once I have spare time. Otherwise this discussion leads us nowhere.

DE-cr commented 1 year ago

On my system, the sqlite3_spiffs example from https://github.com/siara-cc/esp32_arduino_sqlite3_lib uses 624653 bytes for the program and 22004 for global variables.

The ESP32_Console example from https://github.com/siara-cc/sqlite_micro_logger_arduino uses 376505 / 22288 bytes.

The BareMinimum example that comes with the Arduino IDE uses 211053 / 16048 bytes.

Subtracting the BareMinimum values from the SQLite / MicroLogger examples' values leaves us with 413600 / 5956 and 165452 / 6240 bytes, respectively.

Both 404 KB and 162 KB are the same order of magnitude as 500 KB. q.e.d. ;)

fredlcore commented 1 year ago

Thanks. Have you deducted the boilerplate that comes with every esp32 binary? That alone is already several dozens of kilobyte and already part of the BSB-LAN binary, so it would not be added when using sqlite.

DE-cr commented 1 year ago

Yes, see above.

fredlcore commented 1 year ago

Ah, ok, noted. Adding one of the SD_MMC examples (that's what will be used for SD card logging) to BSB-LAN's code adds 366kB to flash memory and 5kB of used RAM. While 500kB would have more or less maxed out the available flash in standard ESP32 configuration running BSB-LAN, 366kB still leaves quite some air to breathe. The Micro Logger library is not an option as it does not (yet) support the VACUUM command to actually shrink the size of a database after deleting expired rows.

DE-cr commented 1 year ago

I've implemented client side filtering to alleviate /DG performance issues when dealing with large datasets: grafik Two date input fields (from ... to) to narrow down data to be plotted. When loading /DG, they are set to the min/max dates found in the dataset (/D) - unless there's >10k lines of data, in which case the 'from' date is set to where the last 10k lines of data begin.

To signal when not all available dates are selected for plotting, a '!' is put next to the date input field that's cutting off data (both in the screenshot here). Changing one of the date input fields redraws the graph.

Compared to server side filtering (using SQLite or whatever), this solution still wastes time by both transmitting the whole /D dataset for /DG, and filtering it each time the plot is redrawn. However, what takes up the bulk of the time is highly reduced (unless the user sets the from/to dates to include all data): the actual plotting of thousands of values.

I've tested this implementation using Firefox 109.0.1 and Chromium 110.0.5481.100 on Ubuntu 22.04.1, as well as on my cheap, old Android phone with Chrome.

The code change adds 544 Bytes to the bsb-lan binary's size.

Let me know if I should upload this /DG version to github!

DE-cr commented 1 year ago

P.S. I've tested this on my i5-3320M / 2.6 GHz laptop using a data set of >300k data points and found the performance quite good - unless I set the from/to date filter to plot more than about 30k data points at once.

fredlcore commented 1 year ago

Cool, thanks! 544 Bytes sounds great of course, how long did it take to display let's say a week worth of data in this 300k dataset? I assume the file would then be around 30MB (or that order of magnitude ;) )?

EDIT: I'm not sure if I remember correctly, but I think you are not using an SD card in your setting? That is the main problem that data throughput of the SD card is rather slow (on the Arduino it was several dozens of kB/s). So downloading a 30MB file to display would take 300 seconds at 100kB/s. That was the point of departure for me to think of a server-side solution that in itself does not have to parse the whole file for just a subsection because SQL(ite) would know where to look for the relevnt data sets.

fredlcore commented 1 year ago

We could then think of a URL-command that purges the dataset except for the last x days of data (worst case by copying over the remaining data points from the old data file to a new file, removing the old and renaming the new file).

DE-cr commented 1 year ago

Times on my system (see above), using Firefox, data set = 31 days 24 hours 60 minutes * 7 parameters = 312480 data points:

The above times are for plotting only, not including the /D data transfer from server to client. I've not done repeated timings for the data subsets above, only a single timing each.

I've done the above timing tests w/o including bsb-lan communication, loading a local file from my computer instead. Since I've created that file artificially (and because I'm lazy), it is smaller than actual /D output and weighs in at 14 MB. A typical line in this file looks like this: 20230131235900000;2023-01-31 23:59:00;7;7;120; instead of something like 364593010;01.05.2022 00:00:15;8314;Kesselrücklauftemperatur Ist;66.7;°C.

DE-cr commented 1 year ago

Btw, I don't see myself running into /DG performance issues anytime soon, even with the existing implementation: I currently log two parameters only, once a day. :)

DE-cr commented 1 year ago

Yes, my ESP32 doesn't have an SD card.

fredlcore commented 1 year ago

Thanks, I didn't see the rendering on the local PC or mobile phone as a problem and your findings support that assumption, although I'm a bit surprised that rendering a month takes 15 seconds. For comparison: I log 49 values every five minutes for 28 days (i.e. approx. 395.000 datapoints) into a SQLite database on my Synology NAS (which runs on a mediocre Intel processor) and have a PHP script displaying these datapoints in 14 different graphs, and to dynamically generate these 14 graphs with all the data contained on one page takes 6-7 seconds, from pressing the reload button to the page finishing loading. But even though it's 50% faster, I wouldn't mind waiting these 15 seconds on the client side to display this amount of data because that does not affect running BSB-LAN. The problem is rather waiting for the server to send (in your case 14 MB of) data. I just ran a check on a 350kB file and the performance is around 150-180 kB/s. Weirdly, the Due isn't much slower than the ESP32 Olimex EVB, and even more weirdly, performance is significantly lower (approx. 50-80kB/s) when using WiFi compared to Ethernet (not sure why, the WiFi connection should not be a problem, BSB-LAN sits right next to the router).

So to transfer a 14 MB file would take 1.5 minutes to download the data on the Due and (Ethernet) Olimex during which BSB-LAN couldn't do anything else, i.e. queries or logging entries during that time would not be possible. On the ESP32, there might be a chance to handle the data preparation and sending in a different core, but not on the Due. The advantage of this approach would be that moving through the data could be done without additional data transfers. The drawback is that renewing the data would always fetch the complete file again, taking another 1.5 minutes.

I guess I'll have to run some performance tests with the ESP32 SQLite implementation and see if that significantly speeds things up on the server side. If it doesn't then I'd happily take you up on your offer.

fredlcore commented 1 year ago

I just ran the tests that come with the ESP32 SQLite implementation: Counting all 9000 rows in a randomly generated database takes around 3.5 seconds. Running a query across these 9000 entries that match a certain condition and list the first 10 of them takes 0.25 seconds - including the time it takes to output the contents on the serial console.

DE-cr commented 1 year ago

An easy solution for people logging much data and usually just wanting to display the newest entries would be to add a URL command to output the newest n kB of /D only, e.g. /DEn or maybe just /DE with n fixed or configurable in /C (/DE = data end). This could very easily be served by seeking n kB back from the end of the open datalog.txt, skipping to the first \n from there, and delivering the rest. If the user should miss the header line in this approach, that could also be easily fixed (but the current /DG implementation doesn't need it: it just skips the first line).

This could be implemented in addition to my current /DG modification, which could then use e.g. /DE100 per default to fetch the data, with a possible extension to fetch /D completely on request.

And yes, it seems C3 and/or D3 has not been designed to handle large datasets. However, I'm still amazed by what it does, and how easy it is to use - for free.

DE-cr commented 1 year ago

As for the transfer times: Did I mention that /D data could easily be compressed, either by running it through a general purpose compressor or by using a different format? ;)

When testing my current /DG mod, I repeatedly found myself jumping between different days in that month's log data, to compare those days. This use case would still require to transfer lots of data, i.e. server side filtering would not help as much as you might think/wish.

DE-cr commented 1 year ago

Btw, waiting 15s for the initial display might be ok, but as @CZvacko says: "working with the chart is quite slow. Any zooming or other operation takes a long time". He/she did not complain about the transfer time. Just saying. ;)

Of course, I also see not binding the server 100% to /DG (or /D) data transfer for extended periods of time as a priority for a system with bsb-lan's main functionality (which I see in communicating with the boiler control, not with a user / web client).

DE-cr commented 1 year ago

One advantage of using a naked standard ESP32 for logging is that you're unlikely to run into /DG performance issues. ;)

Ensuring that bsb-lan doesn't get blocked by extended /D data transfers could also be done by artificially imposing a size limit on datalog.txt. If the standard use case is to look at the most recent data only, this could still be easily achieved by using datalog.txt as a rolling buffer, i.e. opening it rw for logging and seeking to (or staying at) the position after the most recent write operation (or to the file's start after roll-over) before adding new data. Staying with a plain text format for datalog.txt, this could be made easier by switching to a constant line length format, filling lines up with spaces where required. (The current behavior is to just stop logging before bsb-lan runs out of file system space, which will then lose more recent values.)

fredlcore commented 1 year ago

At the moment, I find the access times of SQLite quite compelling.

DE-cr commented 1 year ago
fredlcore commented 1 year ago
DE-cr commented 1 year ago

If SD card reading is the bottleneck:

...and do count me in on users of /DG. :)

CZvacko commented 1 year ago
  • Data logging has been a part of BSB-LAN since almost exactly six years ago.

In my case the story is as follows, after completing the construction of the house and setting up the heating, I noticed incorrect behavior (cycling) during the spring-summer or summer-autumn transition. This problem led me to find some way to debug it, then I found out there is some BSB-LAN, so I ordered it. Datalogging is an essential feature, without it I can't tell the service technician exactly what is happening and when. Without logging, it's just guesswork that leads to a trial-and-error setup procedure (even for an experienced technician). And the reason why a longer data log is needed is because the technician will usually come to you maybe two weeks after the call and want to see some history (compare multiple days). Now that my cycling issue has been resolved, there is just some "room for improvement" in efficiency. In the long term I plan to set up a more robust solution like Grafana, but that's the distant future (I have other priorities right now).

As for the log format, I've never used uptime, not sure of the purpose. I also sometimes export data and create some chart in excel. Then the date+time, parameter number and value should stay AS IS. The description parameter can be removed and added later with some vlookup.

fredlcore commented 1 year ago

@DE-cr: Again: Please provide proof that such a compression actually saves server-side timeload. Yes, some information could be generated on the fly, but it would require thousands of lookups in cmdtbl which in itself is not the most ideal form of storage for lookups, but it is what we have. Caching may not be possible because we don't know the exact number of logged parameters, plus, log parameters may change in one and the same file. Assuming on average a 50% shrink is possible and my pessimistic assumptions are not true, that's still close to a minute for the download. Better, but still not really satisfactory.

The main advantage I see in this is that it would also run on an Arduino Due where I'm not sure the SQLite solution would work. So implementing this approach would not be wasted time because we would have to support two separate ways of storing data anyways, depending on the platform. So if you or anyone else would like to come up with something (binary storage, rolling buffer), I would be interested to see how it performs, and if it is faster by any means, switch to that for the time being.

DE-cr commented 1 year ago

If so desired, I'd be willing to try my hands at the /DE approach I've drafted above, as I think this would be easy and sufficient to serve the most common /DG use cases well.

If someone wants to look at graphs of huge datasets, I'd recommend patience, or systems better suited to that task than a micro controller serving data over wifi from an SD card. :)

DE-cr commented 1 year ago

My implementation of /Dn sending only the most recent n KB (minus the most likely incomplete line at that cut-over point, plus the datalog header) adds 244 bytes to the bsb-lan binary. This should be helpful for both /D and /DG when dealing with large datalogs, allowing to cut down on transmission times.

In case we can agree on a fixed value for n to be used in /DG (how about 999, which should translate to about 15000 data points?), all that's left to do is add that n in the loading of /D in the /DG implementation (adding another 3 bytes to the bsb-lan binary), and to decide if we want to couple this with the client side filtering in my /DG modification described above.

Should a variable value of n be required for /DG, that could be handled in /DG's javascript code, introducing another GUI control there to set n (which, of course, would further increase the size of the bsb-lan binary).

Please note that my testing of this latest modification has been limited: I don't have an SD card in my bsb-lan system, and my current datalog.txt is less than 10k in size. However, I don't see how this should affect the test's significance.

fredlcore commented 1 year ago

Thanks, but does it make sense to implement this before potentially switching to a binary storage format etc.? I often hate it if I have to touch the same thing twice in a short time only to revert part of the changes I've made shortly before. As for the value used for DG, I would calculate it dynamically to match the last three days. I.e. (((3246060)/log_interval)log_parameters)*60(bytes ?) Log parameters and interval can be obtained via /JI for example.

fredlcore commented 1 year ago

I'm just thinking about the compression of the fields: I think we can do away with the milliseconds. While this is a nice way of finding out when the microcontroller had rebooted (due to a crash or a blackout maybe), it is not entirely necessary. With UNIX time we have the problem that it is still 32 bits on the ESP32, at least at the moment. Making it workable until the year 9999 would cost us 38 bits for YMD-HMS. 15 bits for the parameter number, so we and up with 53 bits, except for the value and a "key" field which we need in order to figure out which is the newest/oldest data row as reference point (we cannot use the date/time field because not all heaters provide usable time, so these would start at Epoch after each reboot). The key would probably be 32 bit, so that's 85 bits in total for everything else except the value. For fixed length rows, the value field would have to be set at the maximum of 32 bytes which is the maximum length of a telegram payload. Not sure if reducing this would be feasible because especially novice users who are logging for example the heaters status won't be happy if only the numerical status is logged and not the clear text (which of course won't make sense to display as a chart, but we have had this dual use before). Of course the option values could be converted to readable strings on the fly as well, but then I'm even more sceptical if this will save timeload on the server. But even if we do so, we would have to set aside a 'safe' number of bytes to display floating point numbers up to a certain degree. Maybe 10 bytes could be considered safe if option values will be decoded on the fly.

Another thing to consider is to keep track of the most recent data row in the rolling buffer. While a variable can keep track of this during runtime, a fast method to find this row during startup is necessary that doesn't block the startup process for too long, maybe with a binary tree search using above-mentioned key field.

DE-cr commented 1 year ago

_As for the value used for DG, I would calculate it dynamically to match the last three days. I.e. (((3246060)/log_interval)logparameters)*60(bytes ?) Log parameters and interval can be obtained via /JI for example.

Current log parameters are available via /JI, but they could have been changed during the last previous days, plotting could have been stopped in between, etc - and 60 bytes may or may not be the average line length. And with my current logging setup (two values, once a day), limiting /DG to three days' worth doesn't make sense.

I'm in favor of just using a reasonable KB value that won't bog down the system too much. Hopefully that covers more than enough data, so that the user can just narrow it down to the last three (e.g.) days by using my current /DG javascript filtering. ...and if we let the user change that KB value, there will be enough flexibility to cover most needs.

And after some more thinking, I'm against changing the datalog.txt format to e.g. binary:

As for the uptime [ms] value: /DG uses it to avoid gaps in the plots when values from one data set have differening timestamp, which is quite common at the one-second resolution used. It is much easier to check if the uptime ms differs by more than 999 from that of the previous value than to figure out that 31.12.2022 23:59:59 and 01.01.2023 00:00:00 are almost the same.

fredlcore commented 1 year ago

Well, the /JI information is just a ballpark which will fit most of the time, but as you said before, if you don't exactly get what you expect, it won't hurt that much. However, in most cases, it will be much more accurate than just using a random amount of kB.

I don't understand why you're now aginst switiching to a binary format when you proposed it before and actually got me mostly convinced to do that instead of going down the SQLite way:

Does /DG actually use the ms value to sort plot data? I thought that it uses the full date. And then it doesn't really matter if the points are not actually second-aligned as you won't notice it unless you hover over the data point and compare the times. The scales won't reflect such differences because they are in larger units IIRC.

DE-cr commented 1 year ago

Binary format in file storage only, with the client interface /D untouched? Sure, if you consider the gain worth the effort. I don't see it directly addressing this feature request's original issue, though, which is why I've now changed my focus.

/DG currently uses the ms values to check temporal coherence of adjacent data points, to avoid gaps in the plot lines. => https://github.com/fredlcore/BSB-LAN/pull/459#issuecomment-1157423360

fredlcore commented 1 year ago

I wanted to address the original issue using SQLite. You argued counterwise and suggested a binary format. I agreed to look into that approach. If you've changed your focus because all of a sudden you don't see a connection to the orignal issue, that's perfectly fine, no one is forced to do anything here that one does not find useful, that's includes both of us.

As for the coherence of data points, this could also be done using the unique row ID if the row ID changes only when a new log interval is logged.

fredlcore commented 1 year ago

As for the gain worth the effort: You said you would take a "78% tax cut". And a 50% cut is suddenly not worth the effort? Where do you draw the line and why?

DE-cr commented 1 year ago

Letting ordinary users determine an abstract amount of kB is counter-intuitive and will just increase support work for us.

I'd expect even more confusion when bsb-lan promises to deliver three days worth of data, but delivers either more or less than that, due to changes in the log settings during the last three days. (This of course could easily be fixed by SELECTing by date,)

For me, it's usually the gain/effort ratio that guides me, not gain alone, but as you've said: YOU decide what is (not) done.

fredlcore commented 1 year ago

Again, above you said that it doesn't matter if you get 2.9 days instead of 2 days - and now this is suddenly creating too much confusion? How would users be able to make some kind of distinction based on an abstract amount of kilobyte? Anyways, I think all relevant arguments have been exchanged, I'll seen what and when I can implement.

EDIT: Plus, the JavaScript will be able to find out if the requested date range is covered by the calculated amount. If it is not enough, it can load more. This seems to be far more efficient than always loading at least X kB and then figure out that it is way too much (if someone just logs two parameters per day as you do).

DE-cr commented 1 year ago

If the promise is n days, I find delivering more or less confusing. If the promise is n KB of csv data, I'll gladly accept a few bytes less. How to gauge how many KB are needed to get n days? Try and adapt. I would expect no more than one adapt cycle in most cases, and at least with frequent use, most people should be able to memorize the value that suits their needs best.

From decades of dealing with user requirements, I've learned that clients often change their mind, or realize that they guessed wrong, e.g. that they need seven days of data in /DG instead of three. My proposal is to (per default) deliver as much as the system can handle without problems, and let the user filter out unwanted data (which frequently proves to be more interesting than the client's initial order), or ask for more, if so desired.

As for my currently active daily-averages-logging use case: With that I always want to see as much data as available.