Closed Trusty77 closed 2 years ago
Please explain why this change is necessary when EngineDriver on android has been working happily with this code for over a year.
@Trusty77 I am very familiar with EngineDriver. Feel free to contact me directly and we can investigate whatever you were seeing and why you felt this change would be beneficial. --SteveT
For me, in my case of usage, EngineDriver on my Nokia 8 Is disconnecting/connecting contiuously every 10s.. To be clear, I use the branch ESP32 from version 3, in Wifi AP mode. Perhaps the problem does not happen on 328 chips, or in WIFI local network mode. A friend with the same version of CommandStation on ESP32 and EngineDriver have the same problem. WiThrottle works perfectly.
Note that this PR is not against the ESP32 branch, rather against the master branch. I don't see how turning off the heartbeat would have any effect on the lost connection, rather it would just hide the fact that the connection was lost. But, if you really want to turn off the heartbeat on your copy, just set the timeout value to a very high number of seconds, here: https://github.com/DCC-EX/CommandStation-EX/blob/ESP32/WiThrottle.h#L40
I have proposed this modification in the master branch because i thank that the problem was also there... I dont have the hardware to test on a Uno/Mega... The connection is not really cut, EngineDriver continue to work, but WiThrottle consider that EngineDriver dont respond to its heartbeat and stop the connexion. Ten second later, the ESP32 retrieve the connection, with the same ip, reconnect immediately and works for 10 seconds. Et ceatera. I have had exactly the same symptom in my implementation of WiThrottle protocol for the project Locoduino/Labox based on my library DCCpp, itself based on the original DCC++ ... The solution was to remove the heartbeat. Even without the heartbeat, there is no problem to detect by the Wifi protocol the end of a connection.
Since we don't have this problem on a mega, as far as i can tell, I'm wondering why the esp32 experimental branch has this problem, and is it something to do with the esp32 wifi or are you simply using an out of date engine driver version because we know Ed Does support the timeout.
@Trusty77 To debug this, please turn on the Log Capture in EngineDriver prior to connecting, go through minimal steps to recreate the problem, then close EngineDriver. Then post or email me the EngineDriver log file.
I did not realize that I discuss with the creator of EngineDriver ! So if someone could help me, this is you. EngineDriver version is 2.31.139. I have tested with a connection to my local network, and in AP mode. Same result in both cases. Here one of the log files i have done.
@Trusty77 I see that you are using branch ESP32-Ash. I also had the problem. So I added your 5-line fix to exclude the heartbeat in my local copy. And it worked for me.
Keep in mind that ESP32-Ash is not ready for anything more than testing. I am told that the waveform is somewhat jittery, although I don't know how to confirm it. I made a few changes which eliminated the errors I saw with logic analyzer, and am able to reliably read CVs. I'd suggest you join the discussion on discord where we can discuss your implementation, etc.: https://discord.gg/PuPnNMp8Qf
Perhaps Steve will see something in your log. He is also on discord.
Current path forward on ESP32 development will include testing with Engine Driver, so that there is no disconnect/reconnect problem.
I am quite positive that the patch is not doing what you think it does as you are mixing up the concept of stopping the loco when the hearbeat goes away and the sending of the "I want you to send hearbeat every X seconds" from server to client. Then you have somehow mixed up command parsing and if there would be 2 letter commands you solution would crash. There probably IS a bug somewhere but this is not the right solution. I am not sure how we should handle the situation with a TCP connection that is so bad that there is say a 30 sec delay for a WiThrottle message but the connection is not completely dead and so to say will recover and send new chars. I think we have to think about this and add more diag code for this situation.
Harald.
@Trusty77 thanks for the log. The log shows two different heartbeat values being sent from esp to enginedriver. The normal 10s one works fine. The subsequent 1s heartbeat seems too fast for the esp to respond. Find and change that "1" to "4" and see what results. This also explains why the WiThrottle app doesn't show the same issue. IIRC, Brett sets a minimum heartbeat of 4s and ignores requests for anything less. --SteveT
The result is exactly the same, just with a delay of four seconds... Its seems that if the '4' message is well interpreted, the '10' sent later is ignored, and the delay stay to 4s...
The result is exactly the same, just with a delay of four seconds... Its seems that if the '4' message is well interpreted, the '10' sent later is ignored, and the delay stay to 4s...
Ok, please post that log.
Also, a screenshot of the About screen at that point so we can see what value ED is using.
These are the files. If I understand what I see in the log file :
03-27 20:14:02.527 10254 10272 D Engine_Driver: -->:NEngineDriver (158) 03-27 20:14:02.634 10254 10272 D Engine_Driver: -->:HU1bc248be07751f0a (106) 03-27 20:14:02.738 10254 10272 D Engine_Driver: -->:M0+S44<;>S44 (104) 03-27 20:14:02.872 10254 10383 D Engine_Driver: <--:10 03-27 20:14:02.872 10254 10383 D Engine_Driver: <--:VN2.0 03-27 20:14:02.872 10254 10383 D Engine_Driver: version already set to 2.0, ignoring 03-27 20:14:02.872 10254 10383 D Engine_Driver: <--:HTDCC-EX 03-27 20:14:02.872 10254 10383 D Engine_Driver: <--:RL0 03-27 20:14:02.872 10254 10383 D Engine_Driver: <--:HtDCC-EX v3.2.0 ESP32rc12, ESP32, LABOX_MOTOR_SHIELD, f577c11 03-27 20:14:02.872 10254 10383 D Engine_Driver: <--:PTT][Turnouts}|{Turnout][THROW}|{2][CLOSE}|{4 03-27 20:14:02.872 10254 10383 D Engine_Driver: <--:PPA0 03-27 20:14:02.872 10254 10383 D Engine_Driver: <--:4 03-27 20:14:02.873 10254 10383 D Engine_Driver: <--:M0+S44<;>
For me, i seems that the HU message is sent, then M0 for a new loco, as a return from the server, a 10 and the welcome messages (HTDCC-EX ....) and then a 4 ! And the delay seems to stay like that. The 10 is sent far too early, or the 4 too late :) .
@Trusty77 please post the entire log. The reduction of the heartbeat was added by Chris for something to do with the roster. But it should be setting it back immediately after. I'd like to see why that's not happening.
The posted file was complete, but to be sure, I add a new file. logcat1648483802867[1].txt
The posted file was complete, but to be sure, I add a new file. logcat1648483802867[1].txt
My apologies, I was viewing on my phone and only saw your excerpt and missed the attachment.
@Trusty77 It looks like you immediately acquire a loco each time. Can you not do that and see if you still have the issue in that case? Connect, toggle the power to verify you're connected, but do not acquire a loco. Then disconnect and send me that log.
The specific failure shown in these logs is that the requests for loco S44's speed and direction are never getting a response. Seemingly regardless of how long the timeout is. This is why killing the heartbeat looks like a "fix". The connection is not actually dropping, the CS is simply not responding to those messages for some unknown reason.
The request is:
M0A*<;>qV↵M0A*<;>qR
Note that EngineDriver uses this speed/dir polling as the "heartbeat" while a loco is acquired. It uses the "*
" when no locos are acquired. IIRC, the WiThrottle app always uses the "*
".
The file with no loco selected, but power is on. There no disconnect/connect cycle in this case... logcat1648493480994[2].txt
The file with no loco selected, but power is on. There no disconnect/connect cycle in this case... logcat1648493480994[2].txt
Thanks. That's consistent with what I'm seeing.
Please debug why your version of CS isn't replying to the
M0A*<;>qV↵M0A*<;>qR
message when a loco is acquired.
Pretty sure that's the issue. The reply should look like:
M0AS44<;>V33
M0AS44<;>R1
@Trusty77 previously saved comments/suggestions on debugging this issue:
... an old copy of RingStream... it should be the same as master, or it may be inadvertently sensitive to the esp32 int length.
... an int16/32 issue when calling string formatter so the next parameter, probably a pointer to string, is corrupted causing a random area of memory to be copied to the ring stream.
... issue to resolve from ESP32 having 2 cores; passing variables between cores.
... ESP32 can be sensitive to using the same libraries, settings, etc.
C:\Users\Ash\AppData\Local\Arduino15\packages\esp32\hardware\esp32\2.0.1\libraries
File/Preferences/Board Manager URL
https://raw.githubusercontent.com/espressif/arduino-esp32/gh-pages/package_esp32_index.json
Board: "ESP32 Dev Module"
EEPROM absence does not appear to result in errors. ADC1 pins are only for use for current sense pins, as ESP32-Ash code bypasses multiplexing for ADC1. ADC2 pins are reserved when using WiFi. Further testing is needed to determine if ADC2 pins can be used for digital output while using WiFi.
Remember that branch ESP32-Ash was for my testing. Development efforts may use some of the ideas, but branch ESP32-Ash will not be the branch merged into master. ESP32 development will include testing with Engine Driver, so that there is no disconnect/reconnect problem. Your assistance in identifying the problem is appreciated.
In fact, it is more simpler : WiThrottle::loop() which call checkHeartBeat is never called, so the answers to a lot of EngineDriver questions are not sent. Please tell me where should be the best place to add this function call.
The loop and check heartbeat are obsolete and no longer required as the wifi connection is timing out before the heartbeat check would have done. The loop code has nothing to do with sending responses.
When EngineDriver ask for informations with the message M0A<;>qV↵M0A<;>qR, the locoAction() place the broadcastPending flag to true for each loco. And checkHeartBeat() is the only place where this flag is checked to send or not these informations.
Sorry I was looking in the wrong place. When I check... WiThrottle::loop is called in Wifiinboundhandler... however I suspect that module may be compromised in the experimental replacement of the wifi handling in the esp32 ... which is why we do not yet support the esp32 code used in the branch I think you are using.
When EngineDriver ask for informations with the message M0A<;>qV↵M0A<;>qR, the locoAction() place the broadcastPending flag to true for each loco. And checkHeartBeat() is the only place where this flag is checked to send or not these informations.
Well, looks like WiThrottle::loop() needs to be called from WifiESP::loop() (or somewhere else, but probably from there).
Harald.
If you want to continue with the ESP32, please look what has been done on the PORTX_HAL branch. It has received a lot of WiThrottle fixes lately.
Harald.
The program EngineDriver for Android, using the same protocol as WiThrottle, do not handle heartBeat. So in this case, remove the timings sent to the throttle ("1" and "10").