Open gitolicious opened 3 months ago
@eNBeWe, @passionsfrucht, @irgendwienet, would one of you share your way of debugging SML issues? What is the best way to "replay" the raw data from within ESPHome to identify the problematic line?
Hi @gitolicious
I used my PC connected with an IR reader and a TTL serial to USB converter to record a few seconds of data. Then I extracted some SML records by hand. You could probably use also an ESP to record raw serial data.
Finally I managed to get the SML code running on Windows in Visual Studio and loaded these files. This gave me the ability to debug the code on Windows. That to say: I'm coming from C# background with little experience in plain C.
I found that solution of mine where SmlConsoleApplication.cpp
is the entry point and the other files coming from the esphome repo. Maybe this could be a starting point for you sml-debug.zip
This pdf was also useful: TR-03109-1_Anlage_Feinspezifikation_Drahtgebundene_LMN-Schnittstelle_Teilb.pdf
I hacked together a small main() function inside sml_parser where I could just dump hard-coded byte streams. And I built a small library of SML files according to the BMI specifications.
@gitolicious As far as I can tell all the messages you posted pass the parser. So I guess the issue needs to be either in the private data (the xx bytes) or in the surrounding envelope. Could you maybe send a raw dump? I guess it should still be okay if you adjust some bytes (jumble some numbers to other numbers) but keep the "class" of hex values (numbers, digits, etc.)
This is valuable input, thanks guys! Let me see if I can find the issue myself with @irgendwienet's helper code. Otherwise I might come back to your offer to look into the full dump.
Alright, so I found the debug option on PC a very good idea and more comfortable compared to debugging on the ESP. Unfortunately it didn't identify the issue as it decodes the full dump correctly - I would have expected an issue where the ESP crashes.
Using the format I gathered from above, I ended up with this code:
int main(int argc, char* argv[])
{
std::string hexString = "76 05 00 64 21 ...";
// remove all spaces from the hex string
hexString.erase(std::remove(hexString.begin(), hexString.end(), ' '), hexString.end());
// convert hex string to byte array
std::vector<uint8_t> byteArray = hex_to_bytes(hexString);
// parse bytes to SML
esphome::sml::SmlFile sml_file = esphome::sml::SmlFile(byteArray);
std::vector<esphome::sml::ObisInfo> obis_info = sml_file.get_obis_info();
// print result to stdout
std::cout << "OBIS message size: " << obis_info.size() << std::endl << std::endl;
for (const auto& info : obis_info) {
std::cout << std::left << std::setw(12) << std::setfill(' ') << info.code_repr() << "| ";
for (const auto& byte : info.value) {
std::cout << std::hex << std::setw(2) << std::setfill('0') << (int)byte;
}
std::cout << std::endl;
}
return 0;
}
Output:
OBIS message size: 24
1-0:96.50.1 | 5a5041
1-0:96.1.0 | a010xxxxxxxxxxxxxxxx
1-0:1.8.0 | 000000000013f8e3
1-0:2.8.0 | 0000000000109ce0
1-0:14.7.0 | 000000000000138b
1-0:0.2.0 | 3031
1-0:96.90.2 | 7249a01d
1-0:97.97.0 | 00000000
1-0:96.5.0 | 001c1040
1-0:16.7.0 | 0000000000009061
1-0:36.7.0 | 0000000000000015
1-0:56.7.0 | 0000000000001012
1-0:76.7.0 | 0000000000008039
1-0:32.7.0 | 0000000000005dab
1-0:52.7.0 | 0000000000005d98
1-0:72.7.0 | 0000000000005d73
1-0:31.7.0 | 0000000000000073
1-0:51.7.0 | 000000000000504b
1-0:71.7.0 | 000000000000225e
1-0:81.7.1 | 000000000000409f
1-0:81.7.2 | 000000000000904a
1-0:81.7.4 | 000000000000d06e
1-0:81.7.15 | 000000000000d0e7
1-0:81.7.26 | 000000000000e0c0
I guess this means I will need to run it on an ESP directly and see how it performs there. Might be memory related? I am using an ESP01 1MB with Hichi TTL - IR Lesekopf.
Well, memory issue would be plausible. I don't think the code is extremely well optimized. I have it running on Olimex ESP32-PoE so memory shouldn't be an issue normally. Do you have any other ESP boards to try?
Yes, plenty 🤓 Will run the code on NodeMCU, Wemos D1 and ESP32 variants tomorrow and see if it works with more memory.
I was able to replicate the issue "offline" now. (Yeah, most energy meters are not located in the hacker-friendliest places...)
I wired together two ESPs (UART TX -> UART RX) and sent the recorded hex values from my smart meter SML message, replicating the real SML receiver as closely as possible.
Serial log is showing:
Unhandled C++ exception: OOM
So just as we expected - it looks like a memory issue.
Is there anything I can do on my end to dig deeper into the issue, or would a major improvement in memory handling within the SML component be the only way to get it running on ESP8266?
Is there a simple online tool to create SML messages? I could then try and reduce the size of the message to see "how far away" from a working solution I am.
Last resort would be to upgrade the ESP attached to my smart meter to an ESP32. This requires rework on the 3D printed case and wiring which I would like to avoid if possible.
Hey, thanks for taking this up!
I'm unfortunately not able to take things up right now, as I'm not in the vicinity of the reader and remote access is at least difficult. But if additional binary dumps of the streams are necessary, I'm happy to provide at the end of the week.
As an additional data point regarding memory sizes: I'm using the esp32dev board designation for the board, which only defines 320 kB
of RAM due to some no-name el-cheapo origin. The board has definitely more RAM, so I can raise the limits manually, or use one of the better board flying around to check if such a simple swap will help eventually.
Another note: The log is showing warnings regularly that the processing of the SML data takes too much time, e.g.,
[20:32:44][W][component:237]: Component sml took a long time for an operation (69 ms).
[20:32:44][W][component:238]: Components should block for at most 30 ms.
Which points to too much data again, IMO.
@gitolicious Nice lab-setup and nice find. Too bad that it is indeed memory related. I know of no online sml test generator, I built my test data manually. The SML specifications are actually not toooooo bad to read, so with some patience you could disect the messages and strip them down.
@passionsfrucht The warning about the component taking too long is already "documented" in the corresponding issue. I even have these messages when I use my probe on my smart meter with very few messages (thanks to my energy provider that gave me a seriously cut down meter). Maybe this is more of a problem of the serial transmission. Since the uart port is running at 9600 baud, you can "only" transmit about 280 bytes of data before the component is flagged as "too slow". With additional computation overhead the 30ms are over quick.
After finding an ESP32 C3 SuperMini in a drawer, I decided to replace my ESP01 with that. It just needed a minor change in my 3D-printed case and three short wires - and was definetely easier done than hunting memory leaks in the SML library.
Btw: The (expected) component warning from ESPHome states that it takes 100-150ms for the SML lib to parse these long messages.
[W] [component:237] Component sml took a long time for an operation (105 ms). [W] [component:238] Components should block for at most 30 ms.
What do you think: Should I leave this issue open for others to find it - or even someone brave enough to dig into the memory issues - or should I close it as at least for me everything works fine again after the hardware upgrade?
It will be marked stale and auto-close anyway after some time.
But I guess someone should dig in there at some point ... Then again, SML is purely a german protocol and I guess the affected user base is kind of limited.
The problem
TLDR: SML parser crashes when my smart meter ZPA GH305 sends its extended dataset. Raw data below.
Long version:
As there is a lot of activity going on surrounding the SML parser at the moment (https://github.com/esphome/esphome/pull/6148, https://github.com/esphome/esphome/pull/7235, https://github.com/esphome/issues/issues/6071), I want to share my raw data as requested by @eNBeWe and hope we can find a solution together.
I just got installed a new ZPA GH305.D-S2-01.00-30G by Westnetz (Germany). For reference, here is the manual of another network operator / Netzbetreiber providing the same smart meter: ZPA GH305 and the Tasmota config: https://tasmota.github.io/docs/Smart-Meter-Interface/#zpa-gh305-sml.
ESPHome runs fine and decodes the SML with the reduced dataset (manufacturer code, ID, total consumption, total delivery) but crashes when I input the PIN and enable the extended dataset with the INFO switch which adds a lot of other values to the SML message.
Disabling the SML parser and enabling the UART debug log, I captured the following: (filtered by OBIS messages starting with
77 07 01 00
, a few details censored for privacy, let me know if you need the full dump)Expand: decoded messages
decoded by [https://tasmota-sml-parser.dicp.net/](https://tasmota-sml-parser.dicp.net/) |OBIS (hex)|OBIS|Name|Wert|Einheit|Parsed| |--- |--- |--- |--- |--- |--- | |0x010000020000|0.2.0|Unbekannter Datentyp|01|Unbekannte Einheit|01Unbekannte Einheit (Unbekannter Datentyp)| |0x0100010800ff|1.8.0|Zählerstand Total|1308946|Wh|130894.6Wh (Zählerstand Total)| |0x0100020800ff|2.8.0|Wirkenergie Total|105696|Wh|10569.6Wh (Wirkenergie Total)| |0x01000e0700ff|14.7.0|Netz Frequenz|5002|Hz|50.02Hz (Netz Frequenz)| |0x0100100700ff|16.7.0|aktuelle Wirkleistung|2395|W|2395W (aktuelle Wirkleistung)| |0x01001f0700ff|31.7.0|Strom L1|115|A|0.115A (Strom L1)| |0x0100200700ff|32.7.0|Spannung L1|23975|V|239.75V (Spannung L1)| |0x0100240700ff|36.7.0|Wirkleistung L1|21|W|21W (Wirkleistung L1)| |0x0100330700ff|51.7.0|Strom L2|1342|A|1.342A (Strom L2)| |0x0100340700ff|52.7.0|Spannung L2|23950|V|239.5V (Spannung L2)| |0x0100380700ff|56.7.0|Wirkleistung L2|270|W|270W (Wirkleistung L2)| |0x0100470700ff|71.7.0|Strom L3|8791|A|8.791A (Strom L3)| |0x0100480700ff|72.7.0|Spannung L3|23914|V|239.14V (Spannung L3)| |0x01004c0700ff|76.7.0|Wirkleistung L3|2103|W|2103W (Wirkleistung L3)| |0x0100510701ff|81.7.1|Phasenabweichung Spannungen L1/L2|1183|°|118.3° (Phasenabweichung Spannungen L1/L2)| |0x0100510702ff|81.7.2|Phasenabweichung Spannungen L1/L3|2377|°|237.7° (Phasenabweichung Spannungen L1/L3)| |0x0100510704ff|81.7.4|Phasenabweichung Strom/Spannung L1|3438|°|343.8° (Phasenabweichung Strom/Spannung L1)| |0x010051070fff|81.7.15|Phasenabweichung Strom/Spannung L2|3560|°|356.0° (Phasenabweichung Strom/Spannung L2)| |0x010051071aff|81.7.26|Phasenabweichung Strom/Spannung L3|3597|°|359.7° (Phasenabweichung Strom/Spannung L3)| |0x0100600100ff|96.1.0|Unbekannter Datentyp|xxxxxxxxxxxxxxxxxxxx|Unbekannte Einheit|xxxxxxxxxxxxxxxxxxxxUnbekannte Einheit (Unbekannter Datentyp)| |0x0100600500ff|96.5.0|Unbekannter Datentyp|001c0104|Unbekannte Einheit|001c0104Unbekannte Einheit (Unbekannter Datentyp)| |0x010060320101|96.50.1|Unbekannter Datentyp|ZPA|Unbekannte Einheit|ZPAUnbekannte Einheit (Unbekannter Datentyp)| |0x0100605a0201|96.90.2|Unbekannter Datentyp|xxxxxxxx|Unbekannte Einheit|xxxxxxxxUnbekannte Einheit (Unbekannter Datentyp)| |0x0100616100ff|97.97.0|Unbekannter Datentyp|00000000|Unbekannte Einheit|00000000Unbekannte Einheit (Unbekannter Datentyp)|Which version of ESPHome has the issue?
2024.7.3, same with 2024.6.0
What type of installation are you using?
Home Assistant Add-on
Which version of Home Assistant has the issue?
2024.8.1
What platform are you using?
ESP8266
Board
ESP01
Component causing the issue
sml
Example YAML snippet
No response
Anything in the logs that might be useful for us?
No response
Additional information
No response