ftctechnh / ftc_app

FTC Android Studio project to create FTC Robot Controller app.
761 stars 3.16k forks source link

"Restart Robot" abandons watchdog on active OpMode #709

Closed Windwoes closed 5 years ago

Windwoes commented 5 years ago

As I'm sure anyone who has ever written an autonomous OpMode knows, if your thread does not exit within 1000ms of a stop being requested, the watchdog growls, and the app force-crashes itself in order to terminate your rogue thread. (This is necessary because it is not possible to kill -9 a single thread).

However, I have found a serious bug in the implementation of the watchdog: it appears that the watchdog of the active OpMode is abandoned when a "Restart Robot" command is issued. This causes the RC device to perform a restart robot as if everything were normal, but the rogue thread continues to run in the background.

Here's a sample OpMode:

@TeleOp
public class RogueLoop extends LinearOpMode
{
    @Override
    public void runOpMode()
    {
        waitForStart();

        while (true) //muahaha Thread.interrupt() ain't gonna do you any good here!
        {
            long iT = System.currentTimeMillis();

            while (System.currentTimeMillis() - iT < 500)
            {
                Thread.yield();
            }

            System.out.println("Sysout from rogue loop");
        }
    }
}

Steps to reproduce:

  1. Deploy the SDK to the RC phone with the above OpMode added to the TeamCode module
  2. Ensure that Android Studio is printing logcat to the bottom pane
  3. Ensure the DS and RC phones are connected as normal
  4. Init and run the "RogueLoop" OpMode
  5. Observe that the OpMode prints a message to logcat every 500ms
  6. While the OpMode is still running, press "Restart Robot".
  7. Wait for the "Restart Robot" to complete
  8. Observe that the rogue OpMode continues to print messages to the logcat every 500ms, despite the DS and RC showing every indication that only the "StopRobot" OpMode is being run
Windwoes commented 5 years ago

@gearsincorg will this be fixed in v5.0 too?

gearsincorg commented 5 years ago

No. I've added it as an issue, but I'm not skilled in this area to attempt a fix. (Suggestions?) 5.0 is an early-bird release, primarily for the Control Hub Pilot teams. The hope is to have another 5.x pre-release, and then the main 5.y release in September.

Windwoes commented 5 years ago

@gearsincorg I did some digging and it appears this issue can be fixed by adding the following as the first thing in EventLoopManager.stopEventLoop():

    if(eventLoop.getOpModeManager() != null)
    {
        eventLoop.getOpModeManager().stopActiveOpMode();
    }
gearsincorg commented 5 years ago

Excellent... I can implement this in the current private repo and see if I can replicate your results.

We appreciate your efforts to find and suggest fixes for these problem.

Phil.

Windwoes commented 5 years ago

@gearsincorg any update on this?

Windwoes commented 5 years ago

Also, one other thing, I'd like to revise my suggested fix to:

    if(eventLoop.getOpModeManager() != null)
    {
        eventLoop.getOpModeManager().initActiveOpMode(OpModeManagerImpl.DEFAULT_OP_MODE_NAME);
    }
NoahAndrews commented 5 years ago

Why the switch?

Windwoes commented 5 years ago

Why the switch?

Because that's what's done when the DS sends the command to stop an OpMode. It doesn't call stopActiveOpMode(), instead it calls initActiveOpMode(OpModeManagerImpl.DEFAULT_OP_MODE_NAME)

gearsincorg commented 5 years ago

@FROGbots-4634 I've tested both of your proposed fixes.

The first one successfully detects the stuck loop condition and aborts the running thread the same way it would do if a stop had been requested. The DS is sent the Stuck-Loop error and the RC restarts the app and comes up in a ready to run condition.

The second suggestion did not appear to have any effect. The stuck opmode continued to run through the restart operation, and stayed in memory to permit multiple instances to execute.

After some more testing, I'll be submitting the first approach as a change to the code base.

Thanks for your input.

Windwoes commented 5 years ago

@gearsincorg Ok cool! I never actually tested the second suggestion..... though don't really understand why it wouldn't work, given that's what the DS does. Then again I don't quite fully understand the state machine in OpModeManagerImpl...

Windwoes commented 5 years ago

@ftctechnh does this mean it was fixed in the internal repo?

ftctechnh commented 5 years ago

@FROGbots-4634 Yes. Thanks for the fix! It works well.