baldurk / renderdoc

RenderDoc is a stand-alone graphics debugging tool.
https://renderdoc.org
MIT License
8.88k stars 1.33k forks source link

RenderDoc is unable to connect and debug on Samsung S23 Ultra #3283

Closed unrealsid closed 3 months ago

unrealsid commented 6 months ago

Description

Hello @baldurk,

Hope you're well. I've been using RenderDoc for debugging an Android application I've been developing since sometime. It worked quite well until an Android system update got installed on the Samsung S23 ultra, and now RenderDoc cannot correctly debug or connect to the end device. I am running the latest verison of RenderDoc.

Steps to reproduce

Reproduction steps are as follows:

  1. Enable developer mode on the mobile device and start wireless debugging.
  2. Ensure usb debugging is permitted for the computer.
  3. Open a command prompt to connect the pc to the Samsung Device. On the pc prompt, type adb connect xyz.xyz.xyz.xyz:port where this is the local IP you get from the android device after enabling wireless debugging and ensure adb is correctly connected to the end device.
  4. Start RenderDoc on the pc. Under the Replay Context drop down menu, the Samsung device should be visible. Select it. This starts running the Remote Server prompt on RenderDoc on the PC. The RenderDocCMD Application on the Android device starts up, but then later throws a warning saying this is an older version of the application and the connection between RenderDoc on the PC and Android application fails with RenderDoc on the PC reporting it was unable to start the Remote CMD application. image Clicking on Check for Update does not help or resolve the issue.
  5. Any further connection attempts from RenderDoc on the PC fail and I am unable to debug the application I'm developing on my android device.

Environment

Would be happy if you could please help me resolve this issue. RenderDoc is an integral part of the development and not being able to use it for debugging has put my Android app on hold. Should you require more information for debugging/helping resolve the issue, I would be glad to provide it. Thanks again,

baldurk commented 6 months ago

The update message is a known problem with the Android system, it is a false positive and can be safely ignored. It's not clear from your description - did you try clicking "OK"? It should go away and function as normal, and should only appear the first time you run a new RenderDoc version.

I don't know what the adb connect command does, it's possible it is broken or causing problems. Can you try disconnecting everything, shutting down all running RenderDoc and android programs on the PC, restarting the phone, and then starting up and plugging in the phone without running adb connect to see if that works? Clicking OK on the update popup if it appears.

unrealsid commented 6 months ago

Yes, I did click OK on the prompt, but if it's a false positive, then that likely won't be the root of the problem. And yes, it only appeared once.

adb connect is a command used to establish wireless debug connectivity to an Android device. It used to work until a few months ago. I managed to get the RenderDoc app working with my Android app after following your suggestions. It connects and works well with the device plugged in via USB.

I seem to be running into 2 other issues after this: On RenderDoc on the PC, after setting the Replay Context to my Samsung S23U and having RenderDoc launch the mobile app I'm working on, I select the Capture Frames Immediately option in the PC app. The app does some background work and captures frame data and I then get the following error: image I do see the Capture in the Captures Collected panel but the remote server disconnects.

I need to then reconnect RenderDoc on the PC via the Replay Context dropdown and it then seems to be able to open the Captured data well after that. It's not a major issue since I'm still able to read the data, but I'm wondering if there's a way around it?

And also, another thing I observed: I have two buttons in my app, they usually work well. One opens an Android System file picker and the other loads a mesh when clicked. But when I connect RenderDoc and I press either of the buttons, my app hangs. Is there an issue with clicking Android UI Buttons when RenderDoc is connected to Android?

Thanks

cmannett85-arm commented 6 months ago

Hi @unrealsid, I too have had problems with wireless ADB - my hunch is that it doesn't play nice with the port forwarding RD sets up but I've not looked at it too closely.

I do see the Capture in the Captures Collected panel but the remote server disconnects.

As quick sanity check, can you increase the network timeout? Go to Tools->Settings->Core->Config Editor, then RemoteServer->TimeoutMS and up it to something stupid like 120000. If it works then we know that something is stalling the phone a little but otherwise it's working fine, if it still fails instantly then something at the capture end is likely crashing the server, and if still fails but takes two minutes longer to then we know something is stalling the device indefinitely (a real bug!).

Is there an issue with clicking Android UI Buttons when RenderDoc is connected to Android?

Not in principle but it depends on the what the buttons do. If they both behave the same and one of them only opens the OS file picker then I can't see how RD would affect that - it's only interested in graphics API interception after all. Is there anything interesting in logcat at the time of the button presses?

unrealsid commented 6 months ago

Hello @cmannett85-arm, just to quickly confirm, after increasing the network timeout, you want me to connect via USB or via wireless ADB?

About the second issue, I'll check logcat and report my results back here shortly. Thanks.

cmannett85-arm commented 6 months ago

USB ADB for now, let's tackle one problem at a time.

unrealsid commented 6 months ago

I set the timeout to be 120000 seconds from the advanced settings and then proceeded as follows:

  1. I set the ReplayContext to my phone. RenderDoc runs some remote commands. Remote server is still connected.
  2. I launched the application via RenderDoc. The remote server was still connected.
  3. I proceeded to take a capture. RenderDoc on the PC hung for a bit and the remote server immediately got disconnected, The device data was captured, however.
  4. I then reselect my phone from the Replay Context in RenderDoc and I'm able to open the capture and view the information in it.
unrealsid commented 6 months ago

Regarding the freezing issue, all I get is the following log info when I press a UI button in my app and RenderDoc is connected:

Activity reported stop, but no longer stopping
ANR in com.viewer.fbxviewer (com.viewer.fbxviewer/.MainActivity)
                 PID: 22171
                 Reason: Input dispatching timed out (com.viewer.fbxviewer.MainActivity (server) is not responding. Waited 10001ms for FocusEvent(hasFocus=false))
                 Parent: com.viewer.fbxviewer/.MainActivity

And the app then crashes and the remote server disconnects. I don't really see a reason for the ANR in the log.

This isn't an issue when the objects I'm drawing on screen are loaded the moment the app loads. Which is how I'm testing content at the moment. Ideally, I'd be glad if I could test items as I load them on UI button presses.

Thanks.

cmannett85-arm commented 6 months ago

RenderDoc on the PC hung for a bit and the remote server immediately got disconnected

Did the remote server disconnect at the start of the hang or at the end? In the device tab do you see the 'Capture in progress' progress bar filling up? What do you see in logcat during capture?

This isn't an issue when the objects I'm drawing on screen are loaded the moment the app loads.

Loaded from where to where? Is this is a Vulkan or GLES app?

unrealsid commented 6 months ago

After restarting the Android device, I'm seeing either of 3 things happening randomly each time I try a capture:

Yes, I see a 'Capture in Progress' bar fill up.

Also, Logcat has only the following info in mostly the first 2 cases: failed to connect to socket 'localabstract:renderdoc_39920': could not connect to localabstract address 'localabstract:renderdoc_39920' I don't know if this is relevant, but I see a lot of __rdoc_internal_android_logcat 345569 messages also.

The objects I'm drawing on screen are loaded from the disk using the Default file selector on Android and are drawn on screen. It is a GLES application. Thanks

cmannett85-arm commented 6 months ago

I don't know if this is relevant, but I see a lot of __rdoc_internal_android_logcat 345569 messages also.

You can filter those out, they're internal messages used by RD.

failed to connect to socket 'localabstract:renderdoc_39920': could not connect to localabstract address 'localabstract:renderdoc_39920'

This is the remote server port, you shouldn't be getting connection failures down a wired connection...

Are you able to share your APK so we can try debugging it? Or create a simple equivalent that displays the same issue?

unrealsid commented 6 months ago

Yes, I can create a simpler equivalent of that and send it across. But that will take me a bit of time. I'll try to send it as soon as possible.

Do you have an Samsung S23 Ultra device on hand to test this?

Thanks.

cmannett85-arm commented 6 months ago

Do you have an Samsung S23 Ultra device on hand to test this?

No, I have modern Samsung phones running the same Android version though, so if it's down to some funky Samsung bloatware on the device affecting RD I should be able to reproduce it.

The Samsung S23 Ultra uses a Qualcomm Adreno 740 GPU, so if your problem is down to an odd interaction between RD and the GP driver I'll struggle to help you.

unrealsid commented 5 months ago

Hello @cmannett85-arm, I've built a test app that somewhat mirrors my own production application. Where can I sent it to you? I do not want to put a download link in the comments. Thanks.

cmannett85-arm commented 5 months ago

Thanks @unrealsid, you can send it to me in an email to camden.mannett@arm.com.

cmannett85-arm commented 5 months ago

@unrealsid have you sent the email? I haven't received anything and there's nothing in my email quarantine.

unrealsid commented 5 months ago

Hello @cmannett85-arm, sorry it's taking a while here. I had to make a few adjustments to the app before sending it and have been pretty caught up on other fronts. I'll be making some time to make those adjustments and sending it to you in the next few days. Thanks for checking in. :)

unrealsid commented 5 months ago

Hello @cmannett85-arm, I've sent you an email with a link to the app. Thanks

cmannett85-arm commented 5 months ago

Hi @unrealsid, just letting you know we haven't forgotten about this. I tried your test app on a few different devices:

However unlike your S23 Ultra, the A34 hangs immediately. Attaching a debugger, the OS fires a STOP signal at it once it has realised the app has frozen. Judging by the call stacks it looks like there are OpenGL ES calls coming from two different threads: image image

Both pass-through RD and both are stuck waiting for mutex, sadly without more debug info I can't know if they're waiting for the same mutex.

unrealsid commented 5 months ago

Hey @cmannett85-arm, thanks for the updates. What more information would you require?

cmannett85-arm commented 4 months ago

I've gotten a little further with this. RD uses a global OpenGL ES lock called glLock, and on my test device I see this issue on every run:

GLThread:
    glLock locked by glClear call:
        android::BufferQueueProducer::waitForFreeSlotThenRelock(android::BufferQueueProducer::FreeSlotCaller, std::unique_lock<…> &, int *) const
        android::BufferQueueProducer::dequeueBuffer(int *, android::sp<…> *, unsigned int, unsigned int, int, unsigned long, unsigned long *, android::FrameEventHistoryDelta *)
        android::Surface::dequeueBuffer(ANativeWindowBuffer **, int *)
        ...
        glClear 0x00000070014efbfc
        WrappedOpenGL::glClear(unsigned int) gl_draw_funcs.cpp:4572
        glClear_renderdoc_hooked(unsigned int) gl_hooks.cpp:167
        <unknown> 0x000000701e699914
        <unknown> 0x000000701e6997d0

RenderThread:
    glObjectLabelKHR blocked waiting for glLock:
        NonPI::MutexLockWithTimeout(pthread_mutex_internal_t *, bool, const timespec *) 0x00000070d67bb2bc
        Threading::CriticalSectionTemplate::Lock() posix_threading.cpp:95
        Threading::ScopedLock::ScopedLock(Threading::CriticalSectionTemplate<…> *) threading.h:39
        glObjectLabelKHR_renderdoc_hooked(RDCGLenum, unsigned int, int, const char *) gl_hooks.cpp:167
        set_khr_debug_label(GrGLGpu*, unsigned int, std::__1::basic_string_view<char, std::__1::char_traits<char>>) (.__uniq.111230615403708898952873255848304878871) 0x00000070c78d8708
        GrGLGpu::createTexture(SkISize, GrGLFormat, unsigned int, GrRenderable, GrGLTextureParameters::SamplerOverriddenState *, int, GrProtected, std::string_view) 0x00000070c78d6ec4
        GrGLGpu::onCreateTexture(SkISize, const GrBackendFormat &, GrRenderable, int, skgpu::Budgeted, GrProtected, int, unsigned int, std::string_view) 0x00000070c78d682c
        ...
        android::uirenderer::renderthread::RenderThread::threadLoop() 0x00000070c7557228
        android::Thread::_threadLoop(void *) 0x00000070bdde9310
        __pthread_start(void *) 0x00000070d67b9c30
        __start_thread 0x00000070d674da04

MainThread:
    Blocked waiting for a signal from the RenderThread:
        android::uirenderer::renderthread::DrawFrameTask::drawFrame()

The gist that MainThread is blocked waiting for the RenderThread which is blocked waiting for GLThread to release glLock. GLThread is stuck because it is calling back into the platform when glClear is called but it's waiting on something before it can release glLock.

There's two questions to resolve:

  1. Why are multiple threads doing GL calls? @unrealsid are you using multiple graphics frameworks?
  2. What's consuming the android::BufferQueueProducer slots?
unrealsid commented 4 months ago

Hello @cmannett85-arm:

  1. I'm only using OpenGL ES. In the extended version of the application. But I did notice that the OpenGL ES calls seem to be wrapped in Vulkan calls. I ran AGI and got this for a single draw call. Maybe it's related? image
  2. Would you need any kind of source code from me?

Also, are these issues happening when you press a button on the app?

cmannett85-arm commented 4 months ago
  1. I wouldn't be surprised if on Adreno GLES is implemented in Vulkan in the driver, this isn't visible to RD but if AGI is getting all it's data from Perfetto then maybe the driver is reporting it's Vulkanness through that. You'll have to ask Samsung about that though
  2. Anything that might be relevant is always worth a look. You can send it to my email address and it'll remain private

Also, are these issues happening when you press a button on the app?

No, it happens on rendering start so we could be looking at two different issues.

baldurk commented 3 months ago

Talking to Cam there doesn't seem to be anything more we can investigate on this from our side, so I'm closing this now due to lack of activity. If you have more information to share or more reproduction information please feel free to open a new issue.