acmerobotics / ftc-dashboard

React-based web dashboard designed for FTC
https://acmerobotics.github.io/ftc-dashboard
Other
171 stars 129 forks source link

Fatal robot controller crash during config update in OpMode transition #75

Closed sbdevelops closed 2 years ago

sbdevelops commented 2 years ago

A ConcurrentModificationException can occur when adding config variables to FtcDashboard during OpMode instantiation / initialization. Our team has been able to repeatedly reproduce this issue, which causes the FtcRobotController app to crash.

Details:

The issue appears to be due to FtcDashboard.configRoot variable not being used in a thread-safe manner. When an OpMode (including the default OpMode) is ended, FtcDashboard.onOpModePostStop() is called as it is a registered listener. The following code block shows a shortened version of onOpModePostStop().

@Override
    public void onOpModePostStop(OpMode opMode) {
        synchronized (opModeLock) { ... }

        synchronized (configLock) { ... }

        // this callback is sometimes called from the UI thread
        (new Thread() {
            @Override
            public void run() {
                updateConfig();
            }
        }).start();

        { ... }
    }

Once the new thread is instantiated and begins to run, onOpModePostStop() returns and the event loop thread continues to its next step, instantiating and initializing the new OpMode. Our OpMode that causes the crash then makes a call to FtcDashboard.addConfigVariable(...), which writes to the HashMap contained in configRoot - the same HashMap that the updateConfig() method is currently iterating over in the separate thread! The modification to a HashMap that another thread is iterating over causes the ConcurrentModificationException and subsequent crash.

This is the OpMode in which we experience the aforementioned behavior: TeleOpCore.kt - EagleRobotics7373:FtcRobotController7373:t/dashboard-debug The class referenced in our OpMode that actually calls addConfigVariable(): DashboardVar.kt - EagleRobotics7373:FtcRobotController7373:t/dashboard-debug

Suggestions:

I would imagine some options for potential fixes being:

  1. Don't create a new Thread within onOpModePostStop(), and just let updateConfig() briefly block the event loop thread. The updateConfig() method is called in a blocking manner within addConfigVariable(), so it's not clear why a separate thread is needed anyway in onOpModePostStop().
  2. Encase the updateConfig() call (specifically the one shown above) inside a synchronized(configLock) { ... } structure, and do similar for configRoot.putVariable(...) calls

Stack trace:

1969-12-31 19:34:43.024 3417-3540/com.qualcomm.ftcrobotcontroller E/AndroidRuntime: FATAL EXCEPTION: Thread-17
    Process: com.qualcomm.ftcrobotcontroller, PID: 3417
    java.util.ConcurrentModificationException
        at java.util.HashMap$HashIterator.nextEntry(HashMap.java:851)
        at java.util.HashMap$EntryIterator.next(HashMap.java:891)
        at java.util.HashMap$EntryIterator.next(HashMap.java:890)
        at com.google.gson.internal.bind.MapTypeAdapterFactory$Adapter.write(MapTypeAdapterFactory.java:206)
        at com.google.gson.internal.bind.MapTypeAdapterFactory$Adapter.write(MapTypeAdapterFactory.java:145)
        at com.google.gson.Gson.toJson(Gson.java:669)
        at com.google.gson.Gson.toJsonTree(Gson.java:562)
        at com.google.gson.Gson.toJsonTree(Gson.java:541)
        at com.google.gson.internal.bind.TreeTypeAdapter$GsonContextImpl.serialize(TreeTypeAdapter.java:155)
        at com.acmerobotics.dashboard.config.variable.ConfigVariableSerializer.serialize(ConfigVariableSerializer.java:22)
        at com.acmerobotics.dashboard.config.variable.ConfigVariableSerializer.serialize(ConfigVariableSerializer.java:12)
        at com.google.gson.internal.bind.TreeTypeAdapter.write(TreeTypeAdapter.java:81)
        at com.google.gson.internal.bind.TreeTypeAdapter.write(TreeTypeAdapter.java:74)
        at com.google.gson.internal.bind.TypeAdapterRuntimeTypeWrapper.write(TypeAdapterRuntimeTypeWrapper.java:69)
        at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$1.write(ReflectiveTypeAdapterFactory.java:125)
        at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.write(ReflectiveTypeAdapterFactory.java:243)
        at com.google.gson.Gson.toJson(Gson.java:669)
        at com.google.gson.Gson.toJson(Gson.java:648)
        at com.google.gson.Gson.toJson(Gson.java:603)
        at com.google.gson.Gson.toJson(Gson.java:583)
        at com.acmerobotics.dashboard.FtcDashboard.toJson(FtcDashboard.java:697)
        at com.acmerobotics.dashboard.DashboardWebSocket.send(DashboardWebSocket.java:72)
        at com.acmerobotics.dashboard.FtcDashboard.sendAll(FtcDashboard.java:889)
        at com.acmerobotics.dashboard.FtcDashboard.updateConfig(FtcDashboard.java:744)
        at com.acmerobotics.dashboard.FtcDashboard$8.run(FtcDashboard.java:1006)
rbrott commented 2 years ago

Yes, sprinkling synchronized here and there hardly wards off data races. The blast radius would be pretty small if configRoot were not blatantly leaked outside FtcDashboard. I'd love to make config trees immutable, but that's a significant change. I think the minimally invasive patch I can stomach is deprecating getConfigRoot() and adding a withConfigRoot() method that executes a lambda with the config root under the mutex (and trusts the user to keep the object inside the closure). And then synchronizing inside updateConfig() of course... slightly embarrassed I forgot that.

sbdevelops commented 2 years ago

Thanks for the quick response. What's the timeline on a fix being pushed to the maven repo?

rbrott commented 2 years ago

I (hopefully) pushed a fix. I'd like to get #72 in the next release as well. In the meantime, you can clone, run ./gradlew publishToMavenLocal in the dashboard root dir, add mavenLocal() to the RC project repositories, and bump the dash version from 0.4.3 to 0.4.4-SNAPSHOT.