baldurk / renderdoc

RenderDoc is a stand-alone graphics debugging tool.
https://renderdoc.org
MIT License
9.06k stars 1.35k forks source link

Launching qrenderdoc with --replayhost for an Android device fails to connect to its replay server #3445

Closed jcortell68 closed 1 month ago

jcortell68 commented 1 month ago

Description

RenderDoc supports launching it with a --replayhost option to have it automatically target a remote host (device).

For Android, if you have a device ID 01234567 (what's shown in adb devices), you would launch qrenderdoc as follows:

qrenderdoc.exe --replayhost adb://01234567 

When you do this, you'll see RenderDoc interact with the device. It will launch the replay server app (restart it if it's already running), but then RenderDoc times out trying to connect to it. If you then select the device in the Replay Context selector, you'll see the replay server app on the device get killed and relaunched, but the outcome is the same.

image

This isn't an issue with the host not having enough time to connect with the device. There's a bug in the code that leads to this situation (see more in a follow up comment I'll add).

Steps to reproduce

From a terminal, launch RenderDoc using the cmdline shown above, replacing 01234567 with the adb device ID of your Android phone. You'll see the behavior I described

Environment

jcortell68 commented 1 month ago

I debugged the issue and found that there was some code in android.cpp that seems to unintentionally add a device to the std::map AndroidController::devices. The code should be checking whether the device is in the map before trying to use bracket access to the map, as that creates an entry in the map with that key. This unintentional addition causes code elsewhere to not run--code that not only adds it to that map but at the same time sets up the adb port forwarding for the device and sets the device object's portbase field. Ultimately, the lack of port forwarding and the portbase field not being set is what causes RenderDoc to not connect to the device.

I.e., this fixes the problematic behavior:

diff --git a/renderdoc/android/android.cpp b/renderdoc/android/android.cpp
index e88f6c59c..e1c5a39f9 100644
--- a/renderdoc/android/android.cpp
+++ b/renderdoc/android/android.cpp
@@ -1254,7 +1254,9 @@ struct AndroidController : public IDeviceProtocolHandler

     {
       SCOPED_LOCK(lock);
-      portbase = devices[deviceID].portbase;
+      if (devices.find(deviceID) != devices.end()) {
+        portbase = devices[deviceID].portbase;
+      }
     }

     if(portbase == 0)
jcortell68 commented 1 month ago

I can submit a PR if you prefer that to just the diff above.

MartynJacques-Arm commented 1 month ago

The root cause is likely that the remote host probe thread hasn’t completed probing before the setRemoteHost call connects to the host. Consequently, when RemapPort is called, the device hasn’t been initialised by the probe and added to the devices map.

As suggested in your patch, we should account for the device not being in the device map, but we should also address the underlying issue that leads to this situation.

cmannett85-arm commented 1 month ago

I can't reproduce this on v1.34 (or v1.35) on either Windows or Linux using the same ADB version. I suspect it may be device dependent, perhaps just from timing variation assuming the comment by @MartynJacques-Arm is correct. What device are you experiencing this on?

jcortell68 commented 1 month ago

Pixel 7 with Android 14. Host is Windows 10 Enterprise. I just tried it with v1.34 and it failed ten out of ten times. And yes, the remotes (devices) probing is done on a thread and so in theory this should be a race condition. But the window of success for me seems very small.

baldurk commented 1 month ago

That commit should fix the issue, waiting for enumeration to finish the first time before trying to connect. I also added safety checking to the devices lookup though in theory it should only happen through improper use of the replay API