DabblerDK / MEP-SW-ESP32

Multipurpose Expansion Port (MEP) ESP32 Software for OSGP Smart Meters
22 stars 8 forks source link

"Invalid sequence number" on all packages #7

Closed Jellybeanz1980 closed 1 year ago

Jellybeanz1980 commented 1 year ago

I got "0x0C: Invalid sequence number" on all packages from the meter. When the "BT01 General Manufacturer Information" or "BT52 UTC Clock" are requested using pre-defined MEP package, the correct data is returned, but all the other requests fails with an "invalid sequence number", thus no usable meter data. It seems that after reading the time and manufacturer information program execution doesn't pass the initialization due to the sequence numbering problem.

When the commands in the setup, except the reading of the time and the manufacturer information, are commenting out, then reading of the meter reading fails with the sequence number error. Is there any difference between the command of the reading of the clock and the manufacturer data, and the other commands that causes error?

glynge commented 1 year ago

The requests should be similar - a simple full read of a table. To debug further I'll need the full table data from the "Home" page. The meter actually returns the correct sequence number to use next along with the 0x0C error, and the module the re-issues the failed request with that sequence no... So this is weird and not something we've seen before. Note: 0x0c is common on the first package sent to the meter - as we don't know the correct sequence... After that I think it might change in some rare cases (maybe if the meter is also communicating with its uplink or via IR, but I've never seen it fail after the first initial request).

Please provide the full table data so I can debug further.

Jellybeanz1980 commented 1 year ago

MEP Table

I tried to comment out all the configuration requests, and replaced them with multiple "get time" request. Only the very first package returned "success".

As seen on the attached table, there is no problem with the first two requests, while the configuration packages returns error. I observed that the meter doesn't react to the package first time, but it answers after the second try. The second try is the same request as the first time without increasing the sequence number, while the meter seems to increase the sequence number the second time. This is the reason for the error. I tried to add one to the sequence number on the second try, but I need to make major modification on the code for this, and I don't understand the flow of the code.

glynge commented 1 year ago

Hi. I think you have identified a bug in our software.

Apparently we do not initialize properly, because the initial requests are missing a "read length" count. My module is generating requests without the length until we successfully read table ET50. Then it starts including the length bytes.

The "randomness" of this error is probably based on the software in the meter as the documentation says this regarding the length bytes (page 12): "If this field is missing or less than 2 bytes, then the interpreted value will be indeterminate. No error will be returned in this case.". So on some meters (like ours the initial requests will work without the length bytes, but on others they will cause errors...

We'll work on a bug fix and get back to you...

glynge commented 1 year ago

Issues closed as the bug has been fixed in the main branch

glynge commented 1 year ago

Apparently the bug fix didn't solve the original issue as expected, so reopening the issue. See duplicate issue #9

glynge commented 1 year ago

Hi @Jellybeanz1980,

Sorry our bug-fix didn't solve your issue. I have reopened your original issue to keep track of your bug.

Note: Our bug-fix DID solve a protocol error on our part, and from your latest screenshot on #9 is it clear that your request length is now "00 11", which is correct (before it was "00 0f" which is incorrect for the requests).

It is by design our software re-send the Request without increasing the sequence no. when there is no response from the meter. We don't know if the request was lost before reaching the meter or the Response was lost before we got it. It is pure guessing, but we are following the documentation regarding this (page 21): "Response timeout The MEP should use a response timeout of 500 ms. The timeout interval should start after the last byte of the request is sent until the first 2 bytes (non-0 packet length) of the response is received. If the response is not received, the MEP should try to deliver the packet again using the same packet contents (sequence #, digest, etc) as the original packet."

We have NEVER had issues with this approach previously, and we know of more than 100 installations running this software - both Gen. 3 and Gen. 4 meters. So something must be different in your installations.

A few questions that can help us solve your issue:

Index 0+1 is caused in our setup to check if the communication with the meter is up. As soon as a correct reply is received, we queue the requests in index 2-7. I notice that Index 0 on your latest screenshot on #9 returns the "0c" error code followed by the next expected sequence no. "06 dc 97 e9". In Index 1 the same request is then retried with the received sequence no. "06 dc 97 e9" which is 100% correct. In Index 2 BT01 is read, and the sequence no. is increased to "06 dc 97 ea" - again 100% correct. In index 3 BT21 is then read, with the increased sequence no of "06 dc 97 eb". Apparently the meter does not respond, which after a timeout will cause the request to be sent again (Tries increases with 1). We then get a "0c" with next sequence no. stated as "06 dc 97 ec" (which tell us that the original package was received by the meter, although we didn't receive a response). We then re-issues the request with the new sequence no. in index 8. In index 4 ET03 is read, but fails the same way as index 3 and is re-issued with the new sequence no. in index 9. ...and the same in index 5.

You are correct that the package is originally sent and re-sent in line 386, but the timeout waiting for a reply is specified in line 275 (10000 millis). This line starts the block: "if(millis() - LastSentMillis > 10000) {" Before resending we try to pull the ENABLE pin down on the meter, turn off and on the MAX3232 chip (on our board) and the pull the ENABLE pin up again. It is not best practice to use delay's in a responsive UI, so something we need to improve on. But adding this code improved stability a lot. If you want to test something, you could try to increase the timeout value or remove the if-block completely (the software will then wait for the reply forever).

I hope you can reply with some info on your hardware etc. as requested above, or the above suggestion can help you solve the issue. Please let me know if you have any questions or i can provide more info helping you debug this issue?

glynge commented 1 year ago

Hi again @Jellybeanz1980 Sorry if i was suddenly "shouting" with big bold text in my reply. GitHub apparently does that if you put 3 dashes in your text :-)

Jellybeanz1980 commented 1 year ago

Hi, Thank you very much for your effort. The meter is installed in the basement with very limited access possibilities for debugging. It would be very nice if the debug text could be saved in a log file to be retrieved from the web interface or through a telnet connection.

The schematic that I uses is as following

image

There isn't anything special, except that the MEP enable is inverted compared to the hardware that you use. I configured the correct pins and polarity in the software, thus data is received from the meter.

My observation is that there is no problem as long as the reading of the meter clock and the manufacturer data, there is no problem. As soon as the configuration or reading the consumption data was requested, the meter stops responding to the first try. I even tried to send manufacturer information request multiple times, up to 10 times, without any problems. My guess is there isn't any communications problem through the MEP enable, or RS232 enable.

I'd be happy to try any suggestion. As the following the program executions flow is hard for me, I was not able to try any other methods. I'd like to try with increasing the sequence number at second try instead of re-sending the same packet.

Jellybeanz1980 commented 1 year ago

I tried to remove the if-block at line 275. The execution was stuck as the meter doesn't respond.

glynge commented 1 year ago

Regarding the if-block, that is exactly what I expected. No reply is received.

My best ideas at this point are:

  1. Measure the voltage level on the ENABLE pin. You probably need around at least 8-9v - as far as i remember this power is used to drive the TXD circuit in the meter. If possible, disconnect your hardware to the ENABLE pin and put a 9v battery between it and GND (I've done that previously during testing)

  2. Verify that you are not drawing more than 1watt from the 24v MEP_POWER pin.

  3. Look at the debug output. It is unfortunately not trivial to write a log, but as I'm also not next to my meter, I connected a Raspberry PI to the GND, debug TXD and debug RXD pins. I run Linux on it, can SSH to it and can start a terminal program (I use minicom) to look at the debug output.

  4. Use an oscilloscope to look at the signal from the meter and follow it to your ESP.

Hint: If you want to test your hardware outside the Meter, the software detects if you connect MEP TXD and MEP RXD. It will write something like "Requests are looping back to me. Are you debugging?" in the decoding column of the table.

Let me know if I can provide more info. to help you debug... And please also let me and other know if you figure this out - it might help someone in the future.

Jellybeanz1980 commented 1 year ago

Thank you for the reply. I had at my first attempt not enabled the meter due to the inversion of the signal compared to your hardware. At that point the meter didn't respond at all. The first command, reading the time of the meter, didn't return. After inverting the signal in the code, the meter was able to respond. With this observation I know that the enabling the meter is functioning correctly.

That the meter respond to the first two command means that there is communication between the meter and the reader. As long as only these commands are sent to the meter, everything is OK. The problems occurs when the other commands (configuring and reading the consumption mm) meter stops responding. Unless sending these commands causes the reader draw more current, I don't think that there is a supply issue. As long as the reader keeps sending the "BT52: UTC Clock" and "BT01: General Manufacturer Information" the meter responds correctly on all the commands. I think that there is something missing when sending the configurations commands, or these commands need to be sent twice with the correct sequence number.

You're referring a document about the MEP packet specifications. Is it "078-0372-01BD_MEP_Client_DG-OSGP Alliance_update3A"?

Could all these trouble be because Konstant has closed the MEP port? In the email they say that the MEP port is available, but they in any time can close this port, and encourage to use the optical port.

glynge commented 1 year ago

I'm aware that the two packets works perfectly, but I cannot think of anything else to try besides testing the enable signal. Clearly the meter receives and decodes the request (because it "sees" the used squence no.), but the software never receives any answer. I admit it is a far stretch - another not very likely cause is your MEP key. Are you sure it is correct? (again, I would expect no communication to be working - but I don't know what else to suggest :-)).

Yes, the MEP documentation I'm reffering to is this one: https://github.com/OSGP-Alliance-MEP-and-Optical/Documentation/blob/main/078-0372-01BD_MEP_Client_DG-OSGP%20Alliance_update3A.pdf.

I never seen a Meter with a closed MEP port, so I really can't tell. But I would expect it to not function at all. I HAVE seen a meter with a defective MEP power supply, so I guess your issue could also be caused by a defective meter (again, not likely as some communication is working, but who knows...)

Note: I assume this is NOT a software problem because we have the software working in probably more than 100 installations. It would be very unlikely that it would only fail at your installation. The meter or (with all due respect for your skills :-) ) hardware failing is logically speaking more likely....

Another suggestion: You could try to describe the error on the discussion board here: https://github.com/orgs/OSGP-Alliance-MEP-and-Optical/discussions/categories/q-a. Some of the NES guys follow the discussions there, one of them might have an idea...

And a final idea: uStepper is currently working on their version of our board. When available purchase one of those to see if it have the same error...

Let me know if I can provide more info. to help you debug...

Jellybeanz1980 commented 1 year ago

Thank you for your feedback. I guess I'll need some debugging, and read the document. I have experience in designing circuits, and very limited skills in programming. It is hard to do things correctly without the exact information/data on the meter.

I've tried with wrong MEP key by converting the letters to upper-case by mistake initially. With wrong key it doesn't work at all

I'll try to find a way of debugging, and ask for more details about the meters in the discussion board that you mention. Thank you for simplifying the document to understand by studying the software :)

glynge commented 1 year ago

Hi Jellybeanz1980 Did you find the issue? Maybe others can learn from it?

Jellybeanz1980 commented 1 year ago

Hi Gert,

The issue is not solved yet. Two more hardware will be tested on other meters. I managed to include telnet for remote debugging, and adding some more debug printing for getting closer to the issue. What I see from debugging the meter stops the reply on long reply packages (stops after 101 bytes when expecting 150 bytes). There are no problems with the short packages. I'll let you know if/when I manage to find a solution.

Jellybeanz1980 commented 1 year ago

Hello,

Just an update in this case. I've managed to get the reader to work. The reason of the error was the voltage level of the MEP enable signal. The voltage level was correct in idle, but as soon as the communication was initiated, the meter somehow loaded the signal, and a "discharging" was happened on the signal. This can be seen below:

RigolDS1

The purple trace is the output from the reader in RS232 level, the turquoise trace is the RX of the reader in TTL level, and the yellow trace is the RX of the reader in RS232 level. It is clearly seen that the RX voltage level is correct at the beginning, but decreased, and probably below the threshold after a while, which explains the missing bytes at the end.

The solution was to increase the current that was delivered through the enable signal. The software part is OK.

glynge commented 1 year ago

Happy to hear it was solved. As far as I know the send circuit within the power meter is powered by the power you provide on the enable pin, so your explanation makes good sense.