Bouni / python-luxtronik

python-luxtronik is a library that allow you to interact with a Luxtronik heatpump controller.
MIT License
37 stars 19 forks source link

Are frequent parameter changes dangerous? #158

Closed gerw closed 6 months ago

gerw commented 6 months ago

Recently, I came across an interesting post which reads

I can't comment much on older Luxtronik versions, but if you would reverse engineer, you would see that the 2.1 Luxtronik has a Micron 1.8V 128MB NAND, which pretty certainly means it's one of those 100k erase count chips, and trust me they do balance the eraseblocks. There is an Atmel ARM-board in the luxtronik. Without going too much into the details my take is that changing a setting once or twice a day isn't going to wear that thing out any time soon, but I wouldn't go excessive on those settings changes.

Similarly, a Finnish post (maybe by the same person) translates to

The little birds sang that Luxtronik has Micron's 128 megabyte 1.8V Nand flash, which based on that voltage is most likely SLC technology and lasts up to 100,000 erase cycles per erase block. I hear that on average there are not fifty writes per erase block per year, so we can get to the point that the luxtronik will never die in terms of the number of writes. Even if you fiddle with its settings several times a day, it won't.

So I think tuning the relays is pointless. That Nand is very durable, something the Germans have done right.

Now I am wondering what "I wouldn't go excessive" really means.

I tried to make up some numbers. The first thing I don't know is the size of the erase blocks. In what follows, I use 128kb, but I might be totally wrong. Maybe we can use the featurebug and get this information via ssh? If we have 100k erase cycles per erase block, this means that we have 100M "write actions" (each one writing one erase block).

The next thing that we need to know is how many periodically occurring write actions we have.

Is there any idea how to estimate the file system overhead? Would it help to know the file system?

I think there are no other frequently occurring write actions. I think we can safely ignore firmware updates and other stuff which only happens now and then.

So, let us assume that each DTA file write takes 5 write action and each parameter change takes 3 write actions. Crunching some numbers, I get the following life time estimates:

without any parameter changes: 2851.9 years
1 parameter change per hour:   1629.7 years
10 parameter change per hour:   335.5 years
1 parameter change per minute:   62.0 years
1 parameter change per second:    1.1 years

These numbers should be taken with a huge bag of salt (as I do not have any clue what is happening on the NAND level). However, the lifetime of 2850 years fits nicely with the 50 erase cycles per block per year from above.

If these numbers are roughly correct, writing a parameter each second will kill your heat pump controller within one year. Moreover, one should bear in mind that these numbers could easily be wrong by one or two orders of magnitude. In this case, even 1-2 parameter changes per hour could be dangerous in the long term.

Does anybody have some ideas how to check/validate/improve these numbers?

gerw commented 6 months ago

I just asked my heat pump:

# ubinfo /dev/ubi0
ubi0:
Volumes count:                           6
Logical eraseblock size:                 129024
Total amount of logical eraseblocks:     1024 (132120576 bytes, 126.0 MiB)
Amount of available logical eraseblocks: 62 (7999488 bytes, 7.6 MiB)
Maximum count of volumes                 128
Count of bad physical eraseblocks:       0
Count of reserved physical eraseblocks:  10
Current maximum erase counter value:     179
Minimum input/output unit size:          2048 bytes
Character device major/minor:            253:0
Present volumes:                         0, 1, 2, 3, 4, 5

The erase block size is really 128kb. If I further interpret this numbers correctly, I have (at least) one erase block with 179 erases, the others have lower counts. My heat pump is now almost three years old, again the numbers align well with the above posts. Since October 2023, I was changing parameters quite often (50-100 times per day).

At the time of the last boot (229 days ago), the counters were differently:

# dmesg | grep "NAND device" -A 75 
NAND device: Manufacturer ID: 0xc8, Chip ID: 0x61 (Unknown ESMT NAND 128MiB 1,8V 8-bit)
Scanning device for bad blocks
1 cmdlinepart partitions found on MTD device atmel_nand
Creating 1 MTD partitions on "atmel_nand":
0x000000000000-0x000008000000 : "UBI"
UBI: attaching mtd0 to ubi0
UBI: physical eraseblock size:   131072 bytes (128 KiB)
UBI: logical eraseblock size:    129024 bytes
UBI: smallest flash I/O unit:    2048
UBI: sub-page size:              512
UBI: VID header offset:          512 (aligned 512)
UBI: data offset:                2048
UBI: attached mtd0 to ubi0
UBI: MTD device name:            "UBI"
UBI: MTD device size:            128 MiB
UBI: number of good PEBs:        1024
UBI: number of bad PEBs:         0
UBI: max. allowed volumes:       128
UBI: wear-leveling threshold:    4096
UBI: number of internal volumes: 1
UBI: number of user volumes:     6
UBI: available PEBs:             62
UBI: total number of reserved PEBs: 962
UBI: number of PEBs reserved for bad PEB handling: 10
UBI: max/mean erase counter: 117/91
UBI: image sequence number: 0
[...]
BenPru commented 6 months ago

Great question. I asked me the same in the past. So I'm very interested.

gerw commented 6 months ago

I read a little bit about the file system UBIFS, which is at work on the heat pump. I realized, that I have a misconception in my first post: On the NAND memory, the data is written in chunks of 2048 bytes (a "page"). An erase block has to be erased, if all pages in it are written. Hence, the situation should be a little bit better than the estimates in my first post, since the 9 kB of appl_param1 can be written in 5 pages (+ file system overhead) and not on a full erase block.

However, I still think that one should not attempt to change parameters every second. If one limits oneself to one change per minute, one should be safe, but I will offer no guarantees...

Bouni commented 6 months ago

Do you think we should add a default throttle that prevents high frequent writes (but can be overridden If a user wants to and is aware of the shortening of the heatpumps lifespan)?

gerw commented 6 months ago

I am not sure. I can imagine that the situation is a little bit better than in my rough calculations, and then the frequent writes should not be a problem. If they really are a problem, then this is rather a bug in the Luxtronik controller itself and should be fixed by the firmware.

I would vote for adding a warning to the README (maybe including a link to this issue).

Bouni commented 6 months ago

I would vote for adding a warning to the README (maybe including a link to this issue).

Sound legit! Let's do it this way 👍🏽