DCC-EX / CommandStation-EX

EX-CommandStation firmware from DCC-EX. Includes support for WiFi and a standalone WiThrottle server. A complete re-write of the original DCC++.
https://dcc-ex.github.io/
GNU General Public License v3.0
155 stars 107 forks source link

Make EngineDriver work #221

Closed Trusty77 closed 2 years ago

Trusty77 commented 2 years ago

The program EngineDriver for Android, using the same protocol as WiThrottle, do not handle heartBeat. So in this case, remove the timings sent to the throttle ("1" and "10").

Asbelos commented 2 years ago

Please explain why this change is necessary when EngineDriver on android has been working happily with this code for over a year.

mstevetodd commented 2 years ago

@Trusty77 I am very familiar with EngineDriver. Feel free to contact me directly and we can investigate whatever you were seeing and why you felt this change would be beneficial. --SteveT

Trusty77 commented 2 years ago

For me, in my case of usage, EngineDriver on my Nokia 8 Is disconnecting/connecting contiuously every 10s.. To be clear, I use the branch ESP32 from version 3, in Wifi AP mode. Perhaps the problem does not happen on 328 chips, or in WIFI local network mode. A friend with the same version of CommandStation on ESP32 and EngineDriver have the same problem. WiThrottle works perfectly.

mstevetodd commented 2 years ago

Note that this PR is not against the ESP32 branch, rather against the master branch. I don't see how turning off the heartbeat would have any effect on the lost connection, rather it would just hide the fact that the connection was lost. But, if you really want to turn off the heartbeat on your copy, just set the timeout value to a very high number of seconds, here: https://github.com/DCC-EX/CommandStation-EX/blob/ESP32/WiThrottle.h#L40

Trusty77 commented 2 years ago

I have proposed this modification in the master branch because i thank that the problem was also there... I dont have the hardware to test on a Uno/Mega... The connection is not really cut, EngineDriver continue to work, but WiThrottle consider that EngineDriver dont respond to its heartbeat and stop the connexion. Ten second later, the ESP32 retrieve the connection, with the same ip, reconnect immediately and works for 10 seconds. Et ceatera. I have had exactly the same symptom in my implementation of WiThrottle protocol for the project Locoduino/Labox based on my library DCCpp, itself based on the original DCC++ ... The solution was to remove the heartbeat. Even without the heartbeat, there is no problem to detect by the Wifi protocol the end of a connection.

Asbelos commented 2 years ago

Since we don't have this problem on a mega, as far as i can tell, I'm wondering why the esp32 experimental branch has this problem, and is it something to do with the esp32 wifi or are you simply using an out of date engine driver version because we know Ed Does support the timeout.

mstevetodd commented 2 years ago

@Trusty77 To debug this, please turn on the Log Capture in EngineDriver prior to connecting, go through minimal steps to recreate the problem, then close EngineDriver. Then post or email me the EngineDriver log file.

Trusty77 commented 2 years ago

I did not realize that I discuss with the creator of EngineDriver ! So if someone could help me, this is you. EngineDriver version is 2.31.139. I have tested with a connection to my local network, and in AP mode. Same result in both cases. Here one of the log files i have done.

logcat1648243011221[2].txt

Ash-4 commented 2 years ago

@Trusty77 I see that you are using branch ESP32-Ash. I also had the problem. So I added your 5-line fix to exclude the heartbeat in my local copy. And it worked for me.

Keep in mind that ESP32-Ash is not ready for anything more than testing. I am told that the waveform is somewhat jittery, although I don't know how to confirm it. I made a few changes which eliminated the errors I saw with logic analyzer, and am able to reliably read CVs. I'd suggest you join the discussion on discord where we can discuss your implementation, etc.: https://discord.gg/PuPnNMp8Qf

Perhaps Steve will see something in your log. He is also on discord.

Current path forward on ESP32 development will include testing with Engine Driver, so that there is no disconnect/reconnect problem.

habazut commented 2 years ago

I am quite positive that the patch is not doing what you think it does as you are mixing up the concept of stopping the loco when the hearbeat goes away and the sending of the "I want you to send hearbeat every X seconds" from server to client. Then you have somehow mixed up command parsing and if there would be 2 letter commands you solution would crash. There probably IS a bug somewhere but this is not the right solution. I am not sure how we should handle the situation with a TCP connection that is so bad that there is say a 30 sec delay for a WiThrottle message but the connection is not completely dead and so to say will recover and send new chars. I think we have to think about this and add more diag code for this situation.

Harald.

mstevetodd commented 2 years ago

@Trusty77 thanks for the log. The log shows two different heartbeat values being sent from esp to enginedriver. The normal 10s one works fine. The subsequent 1s heartbeat seems too fast for the esp to respond. Find and change that "1" to "4" and see what results. This also explains why the WiThrottle app doesn't show the same issue. IIRC, Brett sets a minimum heartbeat of 4s and ignores requests for anything less. --SteveT

Trusty77 commented 2 years ago

The result is exactly the same, just with a delay of four seconds... Its seems that if the '4' message is well interpreted, the '10' sent later is ignored, and the delay stay to 4s...

mstevetodd commented 2 years ago

The result is exactly the same, just with a delay of four seconds... Its seems that if the '4' message is well interpreted, the '10' sent later is ignored, and the delay stay to 4s...

Ok, please post that log.

Also, a screenshot of the About screen at that point so we can see what value ED is using.

Trusty77 commented 2 years ago

Screenshot_20220328-090531 1 logcat1648404787096[2].txt

These are the files. If I understand what I see in the log file :

03-27 20:14:02.527 10254 10272 D Engine_Driver: -->:NEngineDriver (158) 03-27 20:14:02.634 10254 10272 D Engine_Driver: -->:HU1bc248be07751f0a (106) 03-27 20:14:02.738 10254 10272 D Engine_Driver: -->:M0+S44<;>S44 (104) 03-27 20:14:02.872 10254 10383 D Engine_Driver: <--:10 03-27 20:14:02.872 10254 10383 D Engine_Driver: <--:VN2.0 03-27 20:14:02.872 10254 10383 D Engine_Driver: version already set to 2.0, ignoring 03-27 20:14:02.872 10254 10383 D Engine_Driver: <--:HTDCC-EX 03-27 20:14:02.872 10254 10383 D Engine_Driver: <--:RL0 03-27 20:14:02.872 10254 10383 D Engine_Driver: <--:HtDCC-EX v3.2.0 ESP32rc12, ESP32, LABOX_MOTOR_SHIELD, f577c11 03-27 20:14:02.872 10254 10383 D Engine_Driver: <--:PTT][Turnouts}|{Turnout][THROW}|{2][CLOSE}|{4 03-27 20:14:02.872 10254 10383 D Engine_Driver: <--:PPA0 03-27 20:14:02.872 10254 10383 D Engine_Driver: <--:4 03-27 20:14:02.873 10254 10383 D Engine_Driver: <--:M0+S44<;>

For me, i seems that the HU message is sent, then M0 for a new loco, as a return from the server, a 10 and the welcome messages (HTDCC-EX ....) and then a 4 ! And the delay seems to stay like that. The 10 is sent far too early, or the 4 too late :) .

mstevetodd commented 2 years ago

@Trusty77 please post the entire log. The reduction of the heartbeat was added by Chris for something to do with the roster. But it should be setting it back immediately after. I'd like to see why that's not happening.

Trusty77 commented 2 years ago

The posted file was complete, but to be sure, I add a new file. logcat1648483802867[1].txt

mstevetodd commented 2 years ago

The posted file was complete, but to be sure, I add a new file. logcat1648483802867[1].txt

My apologies, I was viewing on my phone and only saw your excerpt and missed the attachment.

mstevetodd commented 2 years ago

@Trusty77 It looks like you immediately acquire a loco each time. Can you not do that and see if you still have the issue in that case? Connect, toggle the power to verify you're connected, but do not acquire a loco. Then disconnect and send me that log.

mstevetodd commented 2 years ago

The specific failure shown in these logs is that the requests for loco S44's speed and direction are never getting a response. Seemingly regardless of how long the timeout is. This is why killing the heartbeat looks like a "fix". The connection is not actually dropping, the CS is simply not responding to those messages for some unknown reason. The request is: M0A*<;>qV↵M0A*<;>qR

Note that EngineDriver uses this speed/dir polling as the "heartbeat" while a loco is acquired. It uses the "*" when no locos are acquired. IIRC, the WiThrottle app always uses the "*".

Trusty77 commented 2 years ago

The file with no loco selected, but power is on. There no disconnect/connect cycle in this case... logcat1648493480994[2].txt

mstevetodd commented 2 years ago

The file with no loco selected, but power is on. There no disconnect/connect cycle in this case... logcat1648493480994[2].txt

Thanks. That's consistent with what I'm seeing.

Please debug why your version of CS isn't replying to the M0A*<;>qV↵M0A*<;>qR message when a loco is acquired. Pretty sure that's the issue. The reply should look like: M0AS44<;>V33 M0AS44<;>R1

Ash-4 commented 2 years ago

@Trusty77 previously saved comments/suggestions on debugging this issue:

... an old copy of RingStream... it should be the same as master, or it may be inadvertently sensitive to the esp32 int length.

... an int16/32 issue when calling string formatter so the next parameter, probably a pointer to string, is corrupted causing a random area of memory to be copied to the ring stream.

... issue to resolve from ESP32 having 2 cores; passing variables between cores.

... ESP32 can be sensitive to using the same libraries, settings, etc. C:\Users\Ash\AppData\Local\Arduino15\packages\esp32\hardware\esp32\2.0.1\libraries File/Preferences/Board Manager URL
https://raw.githubusercontent.com/espressif/arduino-esp32/gh-pages/package_esp32_index.json Board: "ESP32 Dev Module"

EEPROM absence does not appear to result in errors. ADC1 pins are only for use for current sense pins, as ESP32-Ash code bypasses multiplexing for ADC1. ADC2 pins are reserved when using WiFi. Further testing is needed to determine if ADC2 pins can be used for digital output while using WiFi.


Remember that branch ESP32-Ash was for my testing. Development efforts may use some of the ideas, but branch ESP32-Ash will not be the branch merged into master. ESP32 development will include testing with Engine Driver, so that there is no disconnect/reconnect problem. Your assistance in identifying the problem is appreciated.

Trusty77 commented 2 years ago

In fact, it is more simpler : WiThrottle::loop() which call checkHeartBeat is never called, so the answers to a lot of EngineDriver questions are not sent. Please tell me where should be the best place to add this function call.

Asbelos commented 2 years ago

The loop and check heartbeat are obsolete and no longer required as the wifi connection is timing out before the heartbeat check would have done. The loop code has nothing to do with sending responses.

Trusty77 commented 2 years ago

When EngineDriver ask for informations with the message M0A<;>qV↵M0A<;>qR, the locoAction() place the broadcastPending flag to true for each loco. And checkHeartBeat() is the only place where this flag is checked to send or not these informations.

Asbelos commented 2 years ago

Sorry I was looking in the wrong place. When I check... WiThrottle::loop is called in Wifiinboundhandler... however I suspect that module may be compromised in the experimental replacement of the wifi handling in the esp32 ... which is why we do not yet support the esp32 code used in the branch I think you are using.

habazut commented 2 years ago

When EngineDriver ask for informations with the message M0A<;>qV↵M0A<;>qR, the locoAction() place the broadcastPending flag to true for each loco. And checkHeartBeat() is the only place where this flag is checked to send or not these informations.

Well, looks like WiThrottle::loop() needs to be called from WifiESP::loop() (or somewhere else, but probably from there).

Harald.

habazut commented 2 years ago

If you want to continue with the ESP32, please look what has been done on the PORTX_HAL branch. It has received a lot of WiThrottle fixes lately.

Harald.