ftctechnh / ftc_app

FTC Android Studio project to create FTC Robot Controller app.
761 stars 3.16k forks source link

Gamepads sometimes disconnect from DS following ESD event #703

Open Windwoes opened 5 years ago

Windwoes commented 5 years ago

This has got to be the weirdest bug I have ever seen, but I have been able to reproduce it nearly 10 times so I'm very confident in my analysis.

Sometimes, right after an ESD event, the gamepad will inexplicably disconnect from the driver station. All that is required to reproduce it is to drive the robot around the field to build up static charge, and then drive it so that it touches an object at a lower potential (or a grounded object); thus creating an ESD event. I have been able to reproduce this with 1) ESD from the field wall 2) ESD from the lander leg 3) ESD from a wheel of the robot touching the metal base of a floor lamp. I actually managed to catch that last one on video. Please ignore the warning message on the driver station; I had a webcam disconnected during this test.

Logs from the DS and RC of this encounter can be found on my other issue that I opened today. (At first I thought that previous issue and this were related and thus posted the logs there; but I now believe they are separate and apart issues)

Windwoes commented 5 years ago

Update: this bug bit me in a real match in Detroit (Ochoa Q-115)...

Not only do I have a commented log from both the driver station and the robot controller, but here's a timestamp to the livestream a few seconds before the event for full context.

Windwoes commented 5 years ago

One notable difference between the behavior I saw in Detroit and that I saw in testing is that in Detroit, not only did the gamepads disconnect, but the OpMode also stopped - in my previous testing, the gamepads disconnected but the OpMode continued to run...

AustinShalit commented 5 years ago

OpModes stopping when a gamepad disconnects is probably a good thing.

Windwoes commented 5 years ago

@AustinShalit whether it is a good thing or not, it is not the normal behavior. Besides, the question here is why are the gamepads disconnecting from the DS....

gearsincorg commented 5 years ago

OpModes stopping when a gamepad disconnects is probably a good thing.

Unless you are running in autonomous, in which case it's an unnecessary disaster. The current behavior is/should be that gamepad buttons/joysticks revert back to their "safe" (off) values.

Windwoes commented 5 years ago

@gearsincorg as I mentioned to you in Detroit, it seems that this issue was triggered by onPause() being called when it shouldn't have (see line 106 in DS log). If you look in FtcDriverStationActivity, you will find this:

    protected void onPause()
    {
        super.onPause();
        RobotLog.vv(TAG, "onPause()");
        analytics.unregister();
        gamepadManager.clearGamepadAssignments();
        mInputManager.unregisterInputDeviceListener(this);
        initDefaultOpMode();
    }

Notice that the gamepads are dropped and the default OpMode (StopRobot) is started.

gearsincorg commented 5 years ago

OK, I've processed the two log files from issue #702 and created a combined log with corrected timestamps. From this I can see what came first, the chicken or the egg, and it's pretty interesting.

Sequence of events Note: to sync the two phone clocks we can look at the Opmode run commands command response times below and determine the time dif between the DS and RC

DS ## 03-07 11:33:09.636 17059 17180 V Robocol : { 4605 14.241} sending CMD_RUN_OP_MODE(14374), attempt: 0
RC -- 03-07 11:33:12.944  3469  3576 V Robocol : received command: CMD_INIT_OP_MODE(14368) $Stop$Robot$
DS ## 03-07 11:33:09.667 17059 17175 V Robocol : { 4606 14.273} received command: CMD_NOTIFY_RUN_OP_MODE(4624) $Stop$Robot$

This indicates that the RC clock is ahead of the DS clock by approx. 3.3 sec (+/- 0.015 sec) So, the time-synched event sequence is:

DS ## 03-07 11:31:15.109 17059 17175 V Robocol : { 4606 19.714} received command: CMD_NOTIFY_RUN_OP_MODE(1020) A - Practice TeleOp
DS ## 03-07 11:31:15.109 17059 17174 V DriverStation: { 4606 19.715} Robot Controller starting op mode: A - Practice TeleOp
RC -- 03-07 11:31:16.382  3469  3576 V Robocol : received command: CMD_RUN_OP_MODE(10887) A - Practice TeleOp
RC -- 03-07 11:31:16.410  3469  3629 V Robocol : sending CMD_NOTIFY_RUN_OP_MODE(1020), attempt: 0
. . .
. . .
DS ## 03-07 11:31:45.473 17059 17174 D SoundPlayer: { 4604 50.077} playing volume=1.000000 samp=8|ms=1044
RC -- 03-07 11:31:47.637  3469  3595 I FtcEventLoop: ^^===== MODULE ATTACH: complete REV Robotics USB..  Hub Portal [DQ2ASZXG]=====^^
   -- ESD EVENT OCCURS
DS ## 03-07 11:33:05.080 17059 17059 V DriverStation: { 4605  9.685} Input device (id = 21) removed.
DS ## 03-07 11:33:05.116 17059 17180 V GamepadManager: { 4605  9.721} transmitting synthetic gamepad user=ONE
RC -- 03-07 11:33:05.323  3469  3576 V EventLoopManager: synthetic gamepad received: id=-2 user=ONE atRest=true
DS ## 03-07 11:33:05.398 17059 17059 V DriverStation: { 4609 10.007} New input device (id = 22) detected.

DS ## 03-07 11:33:09.367 17059 17059 V StateMachine: { 4610 13.977} State with no transitions: WifiState
DS ## 03-07 11:33:09.410 17059 17059 V StateMachine: { 4609 14.019} State with no transitions: WifiState
DS ## 03-07 11:33:09.636 17059 17180 V Robocol : { 4605 14.241} sending CMD_RUN_OP_MODE(14374), attempt: 0
RC -- 03-07 11:33:09.644  3469  3576 V Robocol : received command: CMD_INIT_OP_MODE(14368) $Stop$Robot$
DS ## 03-07 11:33:09.667 17059 17175 V Robocol : { 4606 14.273} received command: CMD_NOTIFY_RUN_OP_MODE(4624) $Stop$Robot$
DS ## 03-07 11:33:09.668 17059 17174 V DriverStation: { 4606 14.274} Robot Controller starting op mode: $Stop$Robot$

From this I think we can conclude:

  1. The first record of any event relating to the ESD and/or game controller was on the Driver Station when the Input device was removed. So there is no log of any trigger from the Robot Controller
  2. The Gamepad reappears 310 mSec after the initial disconnect
  3. The DS sends the $Stop$Robot command 4.3 seconds AFTER the gamepad is reconnected.
    The only significant event before this is the WiFi state change, which appears to initiate the $Stop$Robot action. Still no smoking gun from RC.

Based on this, my conclusion would have to be that the DS phone/gamepad connection was disrupted by the ESD event, but not through the robot, but rather this points to the disruption being transmitted via a EM pulse (ie: radio interference) that is somehow seen at the Gamepad interface, and/or is locking up the WiFi, and confusing the WiFi state machine..

Anyone else see a trigger I'm missing?

Windwoes commented 5 years ago

@gearsincorg Interesting analysis, a few points of my own:

  1. It seems like you looked at the logs from my previous testing which show slightly different symptoms than what I saw in Detoit
  2. The stop robot 4.3 seconds after the event was from me manually pressing stop.
  3. In order to test if EMI/ESD to the DS USB connection was the cause, I walked around the field to build up charge and zapped a grounded object within 2 inches of the DS USB connection about 5 times and nothing odd happened so I'm inclined to believe that is not the cause.
  4. We should also merge the logs from Q115 in Detroit and see what they show since the symptoms were slightly different

Edit for clarity: in testing the OpMode did not stop by itself, I stopped it manually. In Q115, the OpMode stopped by itself.

gearsincorg commented 5 years ago

We should also merge the logs from Q115 in Detroit and see what they show since the symptoms were slightly different

@FROGbots-4634 Since you have the originals can you do that? Double check my time synching as well.

Windwoes commented 5 years ago

@gearsincorg

Since you have the originals can you do that? Double check my time synching as well.

I can try, though I'm not entirely sure how you determined the time difference? Did you assume the command arrived at the RC at exactly the time it was sent from the DS?

Also, what exactly does this message mean?

StateMachine: { -114 7.297} State with no transitions: WifiState

gearsincorg commented 5 years ago

Re: WiFiState change. Yes, I also noticed that.

To determine time difference I found an RC log line that must occur between two close DS log lines, and split the 30mS time difference.

DS ## 03-07 11:33:09.636 17059 17180 V Robocol : { 4605 14.241} sending CMD_RUN_OP_MODE(14374), attempt: 0
RC -- 03-07 11:33:12.944  3469  3576 V Robocol : received command: CMD_INIT_OP_MODE(14368) $Stop$Robot$
DS ## 03-07 11:33:09.667 17059 17175 V Robocol : { 4606 14.273} received command: CMD_NOTIFY_RUN_OP_MODE(4624) $Stop$Robot$

RC 12.944 - DS 9.65 = 3.3 sec (approx)

Windwoes commented 5 years ago

Ok I think I understand. One other thing I suppose I could do to test whether EMI is really at play would be to back up say 12FT from the field while driving around and see if I can still reproduce.

gearsincorg commented 5 years ago

@cmacfarl @rgatkinson @tomeng70

What can cause this DS log message:

V StateMachine: { 4610 13.977} State with no transitions: WifiState

It's happening around the time of the ESD/Gamepad disconnection. Is this expected under any situations, or is it indicative of a problem?

gearsincorg commented 5 years ago

Ok I think I understand. One other thing I suppose I could do to test whether EMI is really at play would be to back up say 12FT from the field while driving around and see if I can still reproduce.

Yes, that would definitely help to determine if the Gamepad is seeing the EMI, but it would not prevent the WiFi from being effected.

gearsincorg commented 5 years ago

Since you are doing more testing, it would help if you re-synched your phones. It should be enough to just let them both connect to the internet for a short while to get them to do a time update.

Windwoes commented 5 years ago

Ok, so assuming that the DS is ~115ms ahead of RC, here is a cleaned-up and the time-synced log for Q115:

DS ## 04-26 08:45:29.184  3512  3683 V Robocol : sending CMD_RUN_OP_MODE(19744), attempt: 0
RC -- 04-26 08:45:29.186  1979  2228 V Robocol : received command: CMD_RUN_OP_MODE(19744) A - Main TeleOp
RC -- 04-26 08:45:29.201  1979  2312 V Robocol : sending CMD_NOTIFY_RUN_OP_MODE(36906), attempt: 0
DS ## 04-26 08:45:29.202  3512  3677 V Robocol : received command: CMD_NOTIFY_RUN_OP_MODE(36906) A - Main TeleOp
DS ## 04-26 08:45:29.203  3512  3676 V DriverStation: Robot Controller starting op mode: A - Main TeleOp
DS ## 04-26 08:45:29.203  3512  3676 V DriverStation: ui:uiWaitingForStopEvent

 === Driving the robot around for a while ==

DS ## 04-26 08:46:07.297  3512  3512 V StateMachine: State with no transitions: WifiState
DS ## 04-26 08:46:07.297  3512  3512 V DriverStation: onPause()
DS ## 04-26 08:46:07.305  3512  3683 V GamepadManager: transmitting synthetic gamepad user=ONE
DS ## 04-26 08:46:07.305  3512  3683 V GamepadManager: transmitting synthetic gamepad user=TWO
DS ## 04-26 08:46:07.306  3512  3683 V Robocol : sending CMD_INIT_OP_MODE(21961), attempt: 0
RC -- 04-26 08:46:07.308  1979  2228 V EventLoopManager: synthetic gamepad received: id=-2 user=ONE atRest=true 
RC -- 04-26 08:46:07.311  1979  2228 V EventLoopManager: synthetic gamepad received: id=-2 user=TWO atRest=true
RC -- 04-26 08:46:07.312  1979  2228 V Robocol : received command: CMD_INIT_OP_MODE(21961) $Stop$Robot$
DS ## --------- beginning of system
DS ## 04-26 08:46:07.363  3512  3512 W ViewRootImpl[FtcDriverStationActivity]: Dropping event due to no window focus: MotionEvent { action=ACTION_MOVE, actionButton=0, id[0]=0, x[0]=0.0039215684, y[0]=-0.0039215684, toolType[0]=TOOL_TYPE_UNKNOWN, buttonState=0, metaState=0, flags=0x0, edgeFlags=0x0, pointerCount=1, historySize=0, eventTime=1834156, downTime=0, deviceId=11, source=0x1000010 }
DS ## 04-26 08:46:07.380  3512  3512 V DriverStation: onResume()
RC -- 04-26 08:46:07.390  1979  4039 V RobotCore: thread: ...terminating 'LinearOpMode main'
RC -- 04-26 08:46:07.393  1979  3677 I RobotCore: Attempting to switch to op mode $Stop$Robot$
RC -- 04-26 08:46:07.440  1979  2312 V Robocol : sending CMD_NOTIFY_INIT_OP_MODE(39167), attempt: 0
DS ## 04-26 08:46:07.442  3512  3677 V Robocol : received command: CMD_NOTIFY_INIT_OP_MODE(39167) $Stop$Robot$
DS ## 04-26 08:46:07.442  3512  3676 V DriverStation: Robot Controller initializing op mode: $Stop$Robot$
DS ## 04-26 08:46:07.446  3512  3676 V DriverStation: ui:uiWaitingForInitEvent
DS ## 04-26 08:46:07.447  3512  3676 V StateMachine: State with no transitions: WifiState
DS ## 04-26 08:46:07.464  3512  3683 V Robocol : sending CMD_RUN_OP_MODE(21973), attempt: 0
RC -- 04-26 08:46:07.466  1979  2228 V Robocol : received command: CMD_RUN_OP_MODE(21973) $Stop$Robot$
RC -- 04-26 08:46:07.480  1979  2312 V Robocol : sending CMD_NOTIFY_RUN_OP_MODE(39173), attempt: 0
DS ## 04-26 08:46:07.483  3512  3677 V Robocol : received command: CMD_NOTIFY_RUN_OP_MODE(39173) $Stop$Robot$
DS ## 04-26 08:46:07.483  3512  3676 V DriverStation: Robot Controller starting op mode: $Stop$Robot$
DS ## 04-26 08:46:07.483  3512  3676 V DriverStation: ui:uiWaitingForInitEvent
Windwoes commented 5 years ago

As I mentioned before, the onPause() call in that log deserves scrutiny, because the DS drops gamepads and switches to $Stop$Robot when onPause() is called. You can see in the lower right hand corner of the livestream video that nobody is touching the DS screen or doing anything weird to it... Also notice that in the log that the onPause() call happens 37 seconds after the start of the teleop program, which perfectly matches up with the time the issue manifests itself visually in the livestream video.

gearsincorg commented 5 years ago

Something I noticed on the live stream video....

Your GREEN LED strip appears to go off the split second that your dispenser touches the rim of the lander. This would imply that there is no timeout involved with this issue....

How is this strip driven? Is it a REV Blinky, or some other method? Just seeing if this provides any additional clues as to the trigger event.

Does the arm retract require powered control, or is it reflex?

Windwoes commented 5 years ago

The LED strip is based on the APA102 chipset and is controlled using a very simply SPI protocol. I'm controlling it by using an I2C to SPI bridge which the GDC surprisingly allowed this year.

The strip turning off is indeed an indication that the comms link was still very much alive, as the program turns off the strip before stopping:

/*
 * A stop was requested and we're about to get out of dodge;
 * turn off the LED strip so as to not waste power.
 */
for(DotstarPixel pixel : robot.dotstarStrip.getAllPixels())
{
    //Set each pixel to off
    pixel.queueRgb(0,0,0);
}
robot.dotstarStrip.apply();

The dumper arm does not require power to retract; gravity is more than enough to bring it down from full extension. During normal operation I actually power it down in closed-loop mode to prevent it from crashing down unnecessarily hard. I think the reason it didn't come down all the way after the program stopped in the live stream is that power to the dumper servo was cut and thus it wasn't holding position, and I think the dumper caught on the slides as it was going down.

gearsincorg commented 5 years ago

Let me recap... So, from what you are saying, an ESD would not naturally be able to turn ALL the LEDs off. (I assume they hold their last state if there is no more SPI data). From this you deduce that the ESD must have induced the DS to stop the opmode running....

So once again we are left to wonder how the DS knew there was an ESD... Seems like we need some more comms diagnostics to see the actual trigger.

Windwoes commented 5 years ago

Let me recap... So, from what you are saying, an ESD would not naturally be able to turn ALL the LEDs off. (I assume they hold their last state if there is no more SPI data).

Yes that's correct.

From this you deduce that the ESD must have induced the DS to stop the opmode running....

Well, I'm not just deducing it, you can see from the logs that the DS sends the stop command...

So once again we are left to wonder how the DS knew there was an ESD... Seems like we need some more comms diagnostics to see the actual trigger.

I'm not sure whether we need more comms diagnostics or Android diagnostics. That call to onPause() is very suspicious. Noah suggested to me that I might try building LineageOS with extra logging enabled to try to trace back which code caused that call to onPause(). However, there is another issue which is that it is now warm where I live and thus the humidity sucking gas heat is no longer running. This means that my house is no longer an ESD city with 1/3" arcs from walking across the floor... which in turn means that any ESD testing I wish to do with my robot will be nearly impossible :(

gearsincorg commented 4 years ago

@FROGbots-4634 Purely on the topic of the Gamepad becoming disconnected, and ignoring for the moment any possible change in app lifecycle.... Over the last cold dry week in MD, I have witnessed several gamepad disconnected that appear to synch with metal-metal contact on the robot.

Symptoms: None of these particular disconnects were accompanied by the opmode stopping.
In several instances only one gamepad became disconnected. Game play was resumed as soon as the Select A sequence was used. When looking at the RC and DS logs at this event, there is no indication of any events/communications other than the gamepad disconnect and reconnect.

So in these instances is seems to be purely a DS/Gamepad issue, somehow magically being caused by the robot ESD.

This is my moment of brilliance: If this IS being induced by high levels of EM radiation, with the gamepad cable being a great receiving antenna, then I wonder if adding Ferrite Chokes to the Gamepad cables might help suppress the effect. One at each end of the cable might be extra effective.

Just a thought.

Windwoes commented 4 years ago

@gearsincorg Well that's actually good that you were able to reproduce somewhat, as that means I'm not entirely out to lunch. When I see it, the OpMode does not always stop. I'm not sure why it would stop sometimes and not others.

Oooh, that is an interesting idea. And it certainly can't hurt. I find it hard to believe that EMI from the robot could be strong enough to affect DS several feet away, but without putting a scope on the gamepad wires we're just guessing right now...

gearsincorg commented 4 years ago

The issue is: I don't know what needs to happen protocol-wise on the USB for the phone to consider the gamepad disconnected. It's hard to believe that a single bad packet transmission would be enough of a problem to disconnect. But perhaps a handshake gets corrupted and something times out.

I still wish we could do automatic reconnects.

Windwoes commented 4 years ago

@gearsincorg So here's something interesting..... I'm not sure if the fact that it's a totally new robot has anything to do with it, but this year we're using Xbox controllers and the same USB hub on the DS and I have yet to see the gamepads drop, even across dozzens of audible arcs to/from the robot.

gearsincorg commented 4 years ago

Need to go back to Logitech and see if dropouts re-appear.

slylockfox commented 4 years ago

I was FTA at a qualifier on Dec 7, where in one match we had three robots become disabled. Two of them exhibited common symptoms of USB disconnection between the RC phone and REV hub, but the third robot lost driver control and ran across the field. The team was able to stop the opmode. After the match, we started the opmode again, and the robot continued to drive straight, without any driver input. We stopped the opmode, disconnected and reconnected the USB hub to the DS phone, and restarted the opmode, and the robot was back to normal. So evidently there was a malfunction of the controller, USB hub, or DS phone, which persisted across opmode restart, and was cleared by disconnecting the USB hub from the DS phone, and it happened during a match where circumstantially it seemed like other robots were having static-related issues. I'm checking to see if the team still has logs from that match.

KevinEmery commented 4 years ago

@slylockfox

I've seen that fairly often, I've attributed it to a flaky connection between the gamepad and the DS phone. It temporarily disconnects, and when it reconnects it re-zero's the gamepad at whatever point everything is at that moment. So all of a sudden full forward becomes 0 and neutral becomes full backwards. My process for helping teams during a match with it is below, usually taking 10-15 seconds.

  1. Stop the Op Mode (so the erratic behavior stops)
  2. Unplug the controllers
  3. Make sure no one is pressing any buttons
  4. Replug and initialize the controllers
  5. Team restarts the op mode and continues with the match

This is also mentioned in the Control System Troubleshooting Guide, section 11.2.3.

Windwoes commented 4 years ago

Need to go back to Logitech and see if dropouts re-appear.

Yeah, I should try that at some point.... I still have yet to see the Xbox 360 controllers drop from DS after many, many ~1500ms or so USB dropouts on the robot due to ESD.