baldurk / renderdoc

RenderDoc is a stand-alone graphics debugging tool.
https://renderdoc.org
MIT License
8.82k stars 1.32k forks source link

Wayland support on Linux #853

Open ebkalderon opened 6 years ago

ebkalderon commented 6 years ago

Many Unix-like platforms are in the process of adopting the more modern Wayland protocol in place of X11, so it would be great to have native Wayland support in the RenderDoc in-application API. Currently, both RenderDoc and the games need to be launched using the XWayland compatibility layer to work properly, which is a bit irritating.

Introducing support for Wayland would require the following changes:

  1. DevicePointer should support an EGLSurface (seems like issue #253 is relevant here).
  2. WindowHandle should support a wl_surface.
baldurk commented 6 years ago

This seems like an odd angle to approach from. The in-application API just falls out of what I can actually hook and capture - the DevicePointer / WindowHandle are cast to void* so it's a question of what window/API handles get registered as to which pairs will work when passed from the application.

I don't know much about wayland but DevicePointer is identifying the API instance itself, so it would be EGLContext and the WindowHandle would be the EGLSurface, but I don't know how wayland initialises things. If it's using EGL then yes it will go through eglCreateContext but I believe that should already work on linux, the hooking was set up by @galpeter while bringing up GLES support.

Can you give more details about what you're actually trying to do and what goes wrong? That would help to give context. Once you're able to capture your program properly then either the in-application API will already work, or it might need a minor bugfix.

I don't think I can test wayland directly here, I think it became available on ubuntu 17.10 on one of my linux installs but I immediately disabled it since it proved buggy and kind of limited. In theory I don't have any problem supporting wayland but it's probably not an immediate priority until it becomes more stable and widespread.

ebkalderon commented 6 years ago

Thanks for the quick response! I already know that both types are typedefs to void*, but I couldn't think of a better way to phrase it in the issue. What I probably should have said is that the methods accepting DevicePointer and WindowHandle should support the types above.

I was actually unaware that EGL was already working in RenderDoc since the in-application API docs do not mention it as a supported option. The game code in question doesn't pass in an EGLContext for RENDERDOC_SetActiveWindow since I assumed RenderDoc wouldn't support it. Thanks for the heads up; I will amend my code and try it out.

EDIT: To my knowledge, the correct window handle type for Wayland is wl_surface, not EGLSurface. I'll need to investigate whether I can acquire an EGLSurface from an existing wl_surface.

baldurk commented 6 years ago

It's not in the docs yet since GLES support in general is not officially shipped :smile:. It will be soon™ at which point I'll update the docs.

ebkalderon commented 6 years ago

Great to hear! I'll get back to you if it works for me.

EDIT: @baldurk So, I've amended my code, built qrenderdoc from the 1.x branch, and launched the application from within the Launch Application tab, and it says "Connection Established" but "API: None". Here's the diagnostic log file:

RDOC 026585: [13:46:39]             core.cpp( 298) - Log     - RenderDoc v1.0 64-bit Development (969fdaa49921d8a4a6c73c740abf20b36ccc2325) loaded in replay application
QTRD 026585: [13:46:39]       qrenderdoc.cpp(  87) - Log     - QRenderDoc initialising.
QTRD 026585: [13:46:39]              unknown(   0) - Warning - Empty filename passed to function
QTRD 026585: [13:46:39]              unknown(   0) - Warning - Empty filename passed to function
RDOC 026585: [13:46:39]    android_tools.cpp( 272) - Log     - COMMAND: /opt/android-sdk/platform-tools/adb 'devices'
QTRD 026585: [13:46:47]              unknown(   0) - Warning - Empty filename passed to function
QTRD 026585: [13:46:47]              unknown(   0) - Warning - Empty filename passed to function
RDOC 026585: [13:46:47]    android_tools.cpp( 272) - Log     - COMMAND: /opt/android-sdk/platform-tools/adb 'devices'
RDOC 026605: [13:46:47]             core.cpp( 298) - Log     - RenderDoc v1.0 64-bit Development (969fdaa49921d8a4a6c73c740abf20b36ccc2325) capturing application
RDOC 026605: [13:46:47]   posix_libentry.cpp(  77) - Log     - Loading into /home/ekalderon/Documents/renderdoc-rs/target/debug/examples/triangle
RDOC 026605: [13:46:47]            hooks.cpp(  45) - Debug   - Hooking libGL.so
RDOC 026605: [13:46:47]            hooks.cpp(  45) - Debug   - Hooking libvulkan.so.1
RDOC 026605: [13:46:47]          app_api.cpp( 255) - Log     - Initialising RenderDoc API version 1.1.1 for requested version 10100
RDOC 026605: [13:46:47]             core.cpp( 495) - Error   - Couldn't find frame capturer for device 00007FDDB4DE30A0 window 0000000000000000
RDOC 026585: [13:46:48]   target_control.cpp( 512) - Log     - Got remote handshake: triangle [26605]

In this case, I'm passing in an EGLContext as the DevicePointer and a wildcard for the WindowHandle.

ebkalderon commented 6 years ago

Interestingly enough, when I launch the application through X11, RenderDoc correctly identifies the API as OpenGL, but still produces the same Couldn't find frame capturer for device error as in Wayland above.

RDOC 002368: [21:04:28]             core.cpp( 298) - Log     - RenderDoc v1.0 64-bit Development (969fdaa49921d8a4a6c73c740abf20b36ccc2325) loaded in replay application
QTRD 002368: [21:04:28]       qrenderdoc.cpp(  87) - Log     - QRenderDoc initialising.
QTRD 002368: [21:04:28]              unknown(   0) - Warning - Empty filename passed to function
QTRD 002368: [21:04:28]              unknown(   0) - Warning - Empty filename passed to function
RDOC 002368: [21:04:28]    android_tools.cpp( 272) - Log     - COMMAND: /opt/android-sdk/platform-tools/adb 'devices'
RDOC 002384: [21:04:35]             core.cpp( 298) - Log     - RenderDoc v1.0 64-bit Development (969fdaa49921d8a4a6c73c740abf20b36ccc2325) capturing application
RDOC 002384: [21:04:35]   posix_libentry.cpp(  77) - Log     - Loading into /home/ekalderon/Documents/renderdoc-rs/target/debug/examples/triangle
RDOC 002384: [21:04:35]            hooks.cpp(  45) - Debug   - Hooking libGL.so
RDOC 002384: [21:04:35]            hooks.cpp(  45) - Debug   - Hooking libvulkan.so.1
RDOC 002384: [21:04:35]          app_api.cpp( 255) - Log     - Initialising RenderDoc API version 1.1.1 for requested version 10100
RDOC 002384: [21:04:35]       linux_hook.cpp(  98) - Debug   - Redirecting dlopen to ourselves for libGL.so.1
RDOC 002384: [21:04:35]   gl_hooks_linux.cpp( 572) - Debug   - glXCreateContextAttribsARB:
RDOC 002384: [21:04:35]   gl_hooks_linux.cpp( 579) - Debug   - 2091: 3
RDOC 002384: [21:04:35]   gl_hooks_linux.cpp( 579) - Debug   - 2092: 2
RDOC 002384: [21:04:35]   gl_hooks_linux.cpp( 579) - Debug   - 9126: 1
RDOC 002384: [21:04:35]   gl_hooks_linux.cpp( 579) - Debug   - 2094: 0
RDOC 002384: [21:04:35]        gl_common.cpp( 408) - Log     - Vendor checks for 0 (Intel Open Source Technology Center / Mesa DRI Intel(R) HD Graphics 620 (Kaby Lake GT2)  / 4.5 (Core Profile) Mesa 17.3.3)
RDOC 002384: [21:04:35]      gl_emulated.cpp(1602) - Log     - Emulating EXT_direct_state_access
RDOC 002384: [21:04:35]        gl_common.cpp( 690) - Warning - FBOs are shared on this implementation
RDOC 002384: [21:04:35]             core.cpp( 495) - Error   - Couldn't find frame capturer for device 00007F10121E9700 window 0000000000000000
RDOC 002368: [21:04:35]   target_control.cpp( 512) - Log     - Got remote handshake: triangle [2384]
RDOC 002368: [21:04:35]   target_control.cpp( 717) - Log     - Used API: OpenGL (Presenting & supported)
QTRD 002368: [21:04:36]              unknown(   0) - Warning - Empty filename passed to function
QTRD 002368: [21:04:36]              unknown(   0) - Warning - Empty filename passed to function
RDOC 002368: [21:04:36]    android_tools.cpp( 272) - Log     - COMMAND: /opt/android-sdk/platform-tools/adb 'devices'
baldurk commented 6 years ago

Oops, apparently ctrl-enter posts a comment.

The device handle looks quite weird, 00007F10121E9700 seems more like a heap address than a context handle (although context handles could be anything in theory). Can you add a print in WrappedOpenGL::CreateContext printing out the ctxdata.ctx value? Like so:

RDCLOG("Created context %p", ctxdata.ctx);

(Note v1.x may be unstable/broken at the moment which might contribute to this).

ebkalderon commented 6 years ago

It's possible, yes. But I'm still happy to have RenderDoc working great on X11, at least!

Back on topic, though, I found this informative blog post describing how EGL contexts and surfaces fit into the greater Wayland protocol stack, though as a disclaimer, it is from 2012 and might be very out of date by now.

ebkalderon commented 6 years ago

@baldurk I added the line you requested to renderdoc/driver/gl/gl_driver.cpp and recompiled, and this is what I got:

RDOC 027601: [22:32:01]             core.cpp( 298) - Log     - RenderDoc v1.0 64-bit Development (969fdaa49921d8a4a6c73c740abf20b36ccc2325) capturing application
RDOC 027601: [22:32:01]   posix_libentry.cpp(  77) - Log     - Loading into /home/ekalderon/Documents/renderdoc-rs/target/debug/examples/triangle
RDOC 027601: [22:32:01]            hooks.cpp(  45) - Debug   - Hooking libGL.so
RDOC 027601: [22:32:01]            hooks.cpp(  45) - Debug   - Hooking libvulkan.so.1
RDOC 027601: [22:32:01]          app_api.cpp( 255) - Log     - Initialising RenderDoc API version 1.1.1 for requested version 10100
RDOC 027601: [22:32:01]       linux_hook.cpp(  98) - Debug   - Redirecting dlopen to ourselves for libGL.so.1
RDOC 027601: [22:32:01]   gl_hooks_linux.cpp( 572) - Debug   - glXCreateContextAttribsARB:
RDOC 027601: [22:32:01]   gl_hooks_linux.cpp( 579) - Debug   - 2091: 3
RDOC 027601: [22:32:01]   gl_hooks_linux.cpp( 579) - Debug   - 2092: 2
RDOC 027601: [22:32:01]   gl_hooks_linux.cpp( 579) - Debug   - 9126: 1
RDOC 027601: [22:32:01]   gl_hooks_linux.cpp( 579) - Debug   - 2094: 0
RDOC 027601: [22:32:01]        gl_driver.cpp( 875) - Log     - Created context 00007FADD73E9700
RDOC 027601: [22:32:01]        gl_common.cpp( 408) - Log     - Vendor checks for 0 (Intel Open Source Technology Center / Mesa DRI Intel(R) HD Graphics 620 (Kaby Lake GT2)  / 4.5 (Core Profile) Mesa 17.3.3)
RDOC 027601: [22:32:01]      gl_emulated.cpp(1602) - Log     - Emulating EXT_direct_state_access
RDOC 027601: [22:32:01]        gl_common.cpp( 690) - Warning - FBOs are shared on this implementation
RDOC 027601: [22:32:01]             core.cpp( 495) - Error   - Couldn't find frame capturer for device 00007FADD73E9700 window 0000000000000000
RDOC 026167: [22:32:02]   target_control.cpp( 512) - Log     - Got remote handshake: triangle [27601]
RDOC 026167: [22:32:02]   target_control.cpp( 717) - Log     - Used API: OpenGL (Presenting & supported)

Looks like that value is indeed what it received, somehow.

EDIT: I checked /proc/self/maps, and it looks like the heap on my computer is from 000055deb4a63000 to 000055deb4a84000, so that address cannot be on the heap.

baldurk commented 6 years ago

Fair enough, I'd normally seen context handles as more.. 'handley' but this is perfectly fine I just wanted to check.

I ran a test myself, I patched one of ogl-samples to call into renderdoc's API, and it worked OK when I tested it. This was on latest v1.x as of now (there was one fix that went in today, but I think only for something that broke since the commit you were working from).

The diff is here if you want to try it:

diff --git a/samples/CMakeLists.txt b/samples/CMakeLists.txt
index 2ade0992..b019b246 100644
--- a/samples/CMakeLists.txt
+++ b/samples/CMakeLists.txt
@@ -11,7 +11,7 @@ function(glCreateSampleGTC NAME)
    add_executable(${SAMPLE_NAME} ${GL_PROFILE_GTC}-${GL_VERSION_GTC}-${NAME}.cpp ${SHADER_PATH})
    add_test(NAME ${SAMPLE_NAME} COMMAND $<TARGET_FILE:${SAMPLE_NAME}>)

-   target_link_libraries(${SAMPLE_NAME} ${FRAMEWORK_NAME} ${BINARY_FILES})
+   target_link_libraries(${SAMPLE_NAME} ${FRAMEWORK_NAME} ${BINARY_FILES} -ldl)
    add_dependencies(${SAMPLE_NAME} glfw ${FRAMEWORK_NAME} ${COPY_BINARY})

    install(TARGETS ${SAMPLE_NAME} DESTINATION .)
diff --git a/samples/gl-320-fbo-layered.cpp b/samples/gl-320-fbo-layered.cpp
index 6ade827a..8e240f9e 100644
--- a/samples/gl-320-fbo-layered.cpp
+++ b/samples/gl-320-fbo-layered.cpp
@@ -1,5 +1,13 @@
 #include "test.hpp"

+#define GLFW_EXPOSE_NATIVE_X11
+#define GLFW_EXPOSE_NATIVE_GLX
+#include <GLFW/glfw3native.h>
+
+#include <dlfcn.h>
+
+#include "/home/baldurk/renderdoc/renderdoc/api/app/renderdoc_app.h"
+
 namespace
 {
    char const* VERT_SHADER_SOURCE1("gl-320/fbo-layered.vert");
@@ -262,6 +270,37 @@ private:
    {
        glm::ivec2 WindowSize(this->getWindowSize());

+       GLXContext ctx = glfwGetGLXContext(glfwGetCurrentContext());
+
+       static int frame = 0;
+       frame++;
+
+       fprintf(stderr, "%d: current context %p\n", frame, ctx);
+
+       RENDERDOC_API_1_1_1 *api = NULL;
+
+       {
+           void *rdoc = dlopen("librenderdoc.so", RTLD_NOW | RTLD_GLOBAL | RTLD_NOLOAD);
+
+           if(rdoc)
+           {
+               pRENDERDOC_GetAPI getapi = (pRENDERDOC_GetAPI)dlsym(rdoc, "RENDERDOC_GetAPI");
+
+               if(getapi)
+               {
+                   getapi(eRENDERDOC_API_Version_1_1_1, (void **)&api);
+
+                   if(frame == 100)
+                       api->StartFrameCapture(ctx, NULL);
+               }
+           }
+       }
+
+       if(api)
+           fprintf(stderr, "%d: got api!\n", frame);
+       else
+           fprintf(stderr, "%d: no api\n", frame);
+
        glm::mat4 Projection = glm::ortho(-1.0f, 1.0f, 1.0f,-1.0f, 1.0f, -1.0f);
        glm::mat4 View = glm::mat4(1.0f);
        glm::mat4 Model = glm::mat4(1.0f);
@@ -301,6 +340,9 @@ private:
            }
        }

+       if(api && frame == 100)
+           api->EndFrameCapture(ctx, NULL);
+
        return true;
    }
 };

As a further test for your case, can you add a print in WrappedOpenGL::SwapBuffers just before it calls ctxdata.AssociateWindow something like:

RDCLOG("Associating context %p with window %p", ctxdata.ctx, windowHandle);

Just to check that the window is getting registered as well, although it shouldn't be needed. It would also be nice to debug through RenderDoc::MatchFrameCapturer before that error about not finding a frame capturer is printed, then you could see which pairs are registered and work from there.

I synced your sample code from renderdoc-rs but it doesn't seem like the one on github is set up to provide a context to start a capture. If you could push that code to a new branch or give me a patch, I could test it locally and see how I get on.

ebkalderon commented 6 years ago

Thanks for the diff! Unfortunately, v1.x doesn't build on my machine anymore after git pull, but I'll be sure to try your changes out once I get it building again.

The latest version that has this capability is in the sys-crate branch; the master branch holds stable legacy code. Check the included triangle example program for usage of the library, and you can find the DevicePointer and WindowHandle trait implementations in src/app/types.rs.

baldurk commented 6 years ago

What problem do you hit building latest v1.x? I'm not seeing it locally or on travis, but with the number of compiler versions around on linux it's easy for some new warning to be generated somewhere. If it's just a warning you can build in release mode, and then there's no warnings as errors, but I would like to fix it.

I switched to the sys-crate branch but it doesn't seem like it calls StartFrameCapture in the triangle example. I don't know anything about rust (I had to ask for help from someone to know how to build the code even!) so I don't know how to make any changes to the example to call the function approriately if that's what you mean. Could you give me a patch so that it precisely repros the case that you're seeing fail?

ebkalderon commented 6 years ago

Well, the specific error I'm hitting is at the 90% mark from linking librenderdoc.so:

/usr/bin/ld: cannot find -lrenderdoc_libentry
collect2: error: ld returned 1 exit status
make[2]: *** [renderdoc/CMakeFiles/renderdoc.dir/build.make:1364: bin/librenderdoc.so] Error 1
make[1]: *** [CMakeFiles/Makefile2:182: renderdoc/CMakeFiles/renderdoc.dir/all] Error 2
make: *** [Makefile:130: all] Error 2

This is from me invoking cd build && cmake .. && make after pulling in the new changes.

As for the Rust code, I'm sorry for not having explained the build process or given further detail. If you need any help from me in getting the code built or understood, please feel free to ask! In the triangle example, I called SetActiveWindow instead of StartFrameCapture on the assumption that the correct context/window pair will be used for subsequent captures when run within RenderDoc. Is this not the case?

EDIT: D'oh! I fixed the build error by invoking make renderdoc_libentry before make. Sorry about that.

baldurk commented 6 years ago

That's a strange error, I did add that project recently to isolate off the module entry point, but unless there are some other errors I don't know why it would fail to link. Can you remove your build folder and see if it's maybe cmake caching that's causing the problem? I've definitely seen cmake cache far too aggressively before and end up in broken states. If that doesn't fix it, check if you have a build/renderdoc/librenderdoc_libentry.a which is where it should be produced. You can touch renderdoc/os/posix/posix_libentry.cpp to force a recompile of it, to examine what make does.

As to the rust code it's OK - once someone had pointed me at cargo build / cargo run --example triangle I had what I needed.

The SetActiveWindow call you linked won't work because of how I treat OpenGL contexts and windows. It's a bit of a long story (or rant!) but it's a common pattern in GL code to create an invisible window just for the sake of activating a context, especially during bootstrap. If I immediately consider a window active as soon as I see a glXMakeCurrent or wglMakeCurrent call, then you're likely to end up with an invisible window that's the current active window, and you'd have to toggle around to get to the 'real' active window.

So instead I only consider windows to be really present once you are presenting to them with glXSwapBuffers / SwapBuffers. At that point they are available and then you can call SetActiveWindow on them, but you're calling it before the window has ever been presented so the call fails.

Note that StartFrameCapture / EndFrameCapture are different because they are designed to allow for capturing headless/off-screen work. So if you pass a NULL window handle to those functions, but still specify the context, they will begin and end capturing even if no window is yet active.

ebkalderon commented 6 years ago

I already did try with a fresh build folder and no dice. However, calling make renderdoc_libentry by hand before invoking make solved the problem for me.

Oh, that's interesting! I didn't know that was how SetActiveWindow works internally. In that case, I'll need to document that in the comments and change my code to use Start/EndFrameCapture. Thanks for that valuable piece of info.

baldurk commented 6 years ago

I wonder if I'm missing a dependency then, to ensure renderdoc_libentry builds before renderdoc. I'll check that out.

For what it's worth, the SetActiveWindow in your code was redundant anyway, since the first window that becomes available becomes active by default - I was able to capture anyway with C even when the call failed.

ebkalderon commented 6 years ago

Whatever changes you made to the CMake scripts fixed the renderdoc_libentry issue for me! Thank you for that. I'm recompiling RenderDoc with OpenGL ES enabled as I type this.

I was calling SetActiveWindow to test whether my compile-time DevicePointer and WindowHandle type checks were holding and behaving properly, though now I know that calling SetActiveWindow that early on doesn't behave as expected. I removed the code from the example, and now I'm setting up some new code which uses Start/EndFrameCapture. Will keep you posted.

ebkalderon commented 6 years ago

Good news! Compiling RenderDoc with ENABLE_GLES works surprisingly well on Wayland. Calling Start/EndFrameCapture (passing the EGL context as a DevicePointer and a null pointer for the WindowHandle) works as expected. As seen in the screenshot below, the client-server connection works and the frame capture gets triggered correctly. 🎉

screenshot from 2018-02-02 21-33-24

Unfortunately, RenderDoc fails to pick up any input for the window (e.g. pressing C to capture), despite passing in a wildcard WindowHandle. I'm pretty sure we do need to wire in Wayland support for this to work.

With that said, from examining this Wayland input tutorial I found, implementing this support in RenderDoc should be doable. If the user provides the RenderDoc API with a wl_surface as a WindowHandle (roughly equivalent to an Xlib Window), you should be able to listen to the wl_keyboard associated with it. Then again, I'm not at all experienced in writing Wayland applications, so I wouldn't know how difficult this would be.

Once I familiarize myself with the RenderDoc code base, I might be willing to take a stab at it, if you'd like.

EDIT: Another caveat; it appears that launching the triangle example and pressing R to launch the replay UI with connectTargetControl set to true causes it to say "Connection: Established" but also "API: None", just like it did in X11. Starting it from within the Launch Application tab works fine.

baldurk commented 6 years ago

That's promising, that suggests that it's most of the way there. I believe vulkan will need explicit wayland support as EGL just overloads the same function used for xlib/xcb/whatever, but vulkan has separate entry points. There'll need to be a default-off ENABLE_WAYLAND flag similar to the default-on ENABLE_XLIB / XCB to cover it as I don't want to require wayland headers for every build.

As for keyboard input, this mostly happens in linux_stringio.cpp:

There are a couple of important functions - on the 'setup' side there's a Keyboard::CloneDisplay(Display *dpy) for xlib and Keyboard::UseConnection(xcb_connection_t *conn) for xcb which sets up a global connection to use - this is currently all that's implemented, but there's also a Keyboard::AddInputWindow(void *windowHandle) that gets passed the actual window handle, which could optionally be used for filtering keypresses only to when that window has focus.

Then the meat of the code is when Keyboard::GetKeyState(int key) is called, which takes a RENDERDOC_InputButton and translates it to whatever internal keycode and returns a bool about whether or not that key is currently pressed. It looks like the example you posted uses callbacks to get notified of input, which isn't ideal for renderdoc since it's external to the application's main loop and shouldn't try to interfere, so a polling solution is better.

If you require the window handle you'll need to hook eglCreateWindowSurface. At the moment it isn't hooked since there's no need for it. But the main problem I foresee is that EGL doesn't provide you any way of knowing whether the handles you're being passed are xlib, xcb, or wayland. You'd need to be able to determine that to know which codepath to go down.

I don't mind getting to all this myself at some stage, but as I say since it's a niche/experimental platform on linux and I'm not able to test it directly then it's not high priority, so any pull requests would be welcome. Check the CONTRIBUTING.md for more information on code and patch formatting etc and feel free to ask any questions.

ebkalderon commented 6 years ago

Thank you so much for all the information! As you requested, I've created all the CMake and globalconfig.h infrastructure for detecting and linking to libwayland-client and libwayland-egl in a new branch based on v1.x. I will push that up to a personal repo fork soon for you to examine. However, I noticed a few complications with Wayland I thought I should mention.

First issue: Wayland is a heavily callback-based API for reasons of thread safety, so there is no polling version available at all AFAICT. I don't think it would interfere with RenderDoc's main loop since the client application is the one updating it and pumping messages; we're just listening to events. To implement support for Wayland, I think we could do the following:

  1. Store the keyboard state as a collection of pressed keys (probably rdctype::array<RENDERDOC_InputButton>) inside the Keyboard namespace.
  2. Rregister a keyboard listener callback with a wl_display that updates this array of pressed keys whenever it wants.
  3. When Keyboard::GetKeyState(int key) gets called, it checks the array of pressed keys and returns true or false.

Second issue: as a security measure, Wayland is very reluctant to give arbitrary clients access to other clients' keyboards. Since we are literally handing over a wl_surface * and/or a wl_display * as a WindowHandle to RenderDoc and not trying to access it from the outside (e.g. by talking to the display server directly), there shouldn't be any problems with this. But again, I've never written any Wayland applications before, so I wouldn't know until I try.

baldurk commented 6 years ago

Sounds good, you can of course PR the support in multiple parts, especially if it's behind a feature flag that defaults to off then there's no harm in bringing it in piecemeal.

Regarding the callbacks, if that's the only way to implement it then fair enough, but I remain a little skeptical that it will work without problems. I'd expect the library to work great when within the application doing its own main loop, but RenderDoc injected doing its own thing often breaks the assumptions of libraries that only expect a single application to be using them. While injected RenderDoc has no main loop of its own - it will be checking for keypresses inside eglSwapBuffers whenever that happens.

Either way any work you do on this is welcome, let me know when you have a PR to look at and we can take it from there.

ebkalderon commented 6 years ago

So, it turns out that the most common way to detect a Wayland session over an X11 session is to simply check for the existence of a DISPLAY environment variable (X11) or WAYLAND_DISPLAY (Wayland). I'm not sure if I could call this 100% reliable, but I can't seem to find any other examples of runtime detection.

EDIT: For context, I'm currently trying to adapt eglplatform.h to conditionally select the correct EGLNative{Display,Pixmap,Window}Type definitions.

baldurk commented 6 years ago

That seems really sketchy and unreliable :disappointed:. I worry that there will be cases where you'll have both variables set and be unable to decide.

You shouldn't modify the eglplatform.h file, that comes from the upstream official EGL headers. Presumably you could just cast to the type you need? How do normal wayland programs cast their types to the EGL types?

ebkalderon commented 6 years ago

Usually, Wayland applications use the wayland-egl.h header to abstract over EGL. Thankfully, a simple C-style cast to EGLNativeWindowType seems to compile correctly for me, so no issues there. The real problem now is runtime display server detection.

According to this blog post by Martin Graesslin of KDE fame, currently working on porting applications from X11 to Wayland, a common solution with Qt-based applications is something like:

#if HAVE_X11
    if (QX11Info::isPlatformX11()) {
        callX11SpecificImplementation();
    }
#endif
#if HAVE_WAYLAND
    if (QGuiApplication::platformName().startsWith(
            QLatin1String("wayland"), Qt::CaseInsensitive)) {
        callWaylandSpecificImplementation();
    }
#endif

While this should work well for qrenderdoc, I don't know how to handle renderdoccmd in this instance.

baldurk commented 6 years ago

Handling renderdoccmd is nice but probably not super important - the display connection in that case is only used for a couple of commands that will create previews of captures. If that didn't work on wayland it wouldn't be the end of the world, most of its use is command-line only.

ebkalderon commented 6 years ago

I've been looking further into how applications detect X11 and Wayland sessions at run-time, and I can't find any other way except for checking for DISPLAY and WAYLAND_DISPLAY at startup (or XDG_SESSION_TYPE on some desktops). 😞 This is quite disappointing; I wish there was a way to tell from the handles being passed from the injected application.

ebkalderon commented 6 years ago

Also, it looks like Keyboard::{Add,Remove}InputWindow isn't being called consistently for all drivers when I use the Launch Application tab. Despite adding the function calls into RenderDoc::{Add,Remove}FrameCapturer, they don't seem to get called when I start a Wayland program through that tab, only after I quit the program and enter replay mode. Even then, the GL driver seems to call AddInputWindow many times, even when RenderDoc is quitting, confusingly enough.

Additionally, I noticed that there are two separate arrays for m_DeviceFrameCapturers and m_WindowFrameCapturers. I can easily call the Keyboard methods on the latter, but not the former since there is no window handle to use.

Any insight on how to solve these issues would be very helpful!

baldurk commented 6 years ago

You should not call any of the keyboard functions from inside {Add,Remove}FrameCapturer. The frame capturers are related but orthogonal to what you want to do, they identify the pairs that are used to trigger captures either internally or from the API. For example the m_DeviceFrameCapturers are used for when there are no windows registered at all, and the API is called with a NULL window handle, so it doesn't really make sense to figure out a way to register a window for it.

Also FYI the keyboard input window functions should only be called during capture, they're unused during replay (although maybe some common code still calls them it's not used for anything). If there's an issue getting GL to call the functions consistently then it should be fixed at the source in the GL driver.

I suspect the issue is because GL's context handle is a complete mess and is very difficult to actually untangle after the fact. At the moment AddInputWindow is called from WrappedOpenGL::ActivateContext which is equivalent to egl/glXMakeCurrent. That's the first point at which we can actually tell a window is being used with GL. Similarly, RemoveInputWindow isn't called at all because there's no corresponding unregister that makes sense.

So the question is - when do you want the information about window use and what's it going to feed into ultimately? That will determine what we can do, considering awkward edge cases. The current code on the PR doesn't use the window at all and seems to get the input from the display directly - is this only to implement filtering so that only input when the window is focussed gets considered?

ebkalderon commented 6 years ago

Okay, I'll remove those Keyboard functions from {Add,Remove}FrameCapturer. But it's very difficult for me to tell from reading the source where the keyboard input is supposed to occur, since doesn't seem to be a set rhyme or reason where these functions are supposed to be called, so any insight would be welcome.

If you check the current state of the PR, it does make use of both the wl_display and wl_surface. The wl_surface given to AddWaylandInputWindow gets passed into the listener callbacks as a user data argument, and gets checked against every time the callback triggers to make sure we're filtering the correct window. This filtering code isn't implemented yet, but it doesn't really matter at the moment, since I can't even get the keyboard input to register.

Ultimately, during capture mode, I want the currently active window to get registered with AddInputWindow so that GetKeyState(int key) can work for m_CaptureKeys. When the active window changes, I want the old window to get deregistered with RemoveInputWindow and have the new one take its place with AddInputWindow. Does this make sense?

baldurk commented 6 years ago

it's very difficult for me to tell from reading the source where the keyboard input is supposed to occur, since doesn't seem to be a set rhyme or reason where these functions are supposed to be called, so any insight would be welcome.

Yeh this is unfortunately a consequence of how OpenGL's design is a problem rather than intentional. We can look at vulkan for a better example of how the flow should be:

  1. When a VkSurfaceKHR is created, we call AddInputWindow for the underlying window.
  2. When a VkSurfaceKHR is destroyed, we call RemoveInputWindow (if there was a window for that surface).
  3. (Non vulkan specific) The keyboard input is checked in RenderDoc::Tick() which happens from the API's present call. In all other implementations this is a simple poll of the current state, for wayland this will poll against the cached data which is updated by callbacks.

D3D11 and D3D12 work the same way - the input windows are tied to swapchain creation/destruction.

On GL this flow is complicated by steps 1 and 2 being not direct. Windows are attached and detached from contexts arbitrarily and there's no evident lifetime - it's all implicit. Currently the frame capturers work by adding for a window when we first present with it with a given context, and then just kind of.... timing out after 5 seconds :disappointed:.

We can't call AddInputWindow directly in the same place because we need to be able to process input even if there's no active presentation happening (I can go into more detail on this if you want but I don't want to get side-tracked). Instead we call it when the context first becomes current and then just leave it alone. This isn't ideal but should at least give us a set of windows to filter for.

If you check the current state of the PR, it does make use of both the wl_display and wl_surface.

OK fair enough, as mentioned on the comments there I hadn't checked the PR's most recent changes until I understood how you wanted to use it and if it was in a state to review yet. It looks like you're doing essentially the same as the windows path - fetching global input and then simply filtering the results by a window whitelist, so that should be fine.

Ultimately, during capture mode, I want the currently active window to get registered with AddInputWindow so that GetKeyState(int key) can work for m_CaptureKeys. When the active window changes, I want the old window to get deregistered with RemoveInputWindow and have the new one take its place with AddInputWindow. Does this make sense?

It makes sense but it can't work like this. The active window has a particular meaning in RenderDoc - hitting the capture shortcut will begin a capture at the next present of the active window, and end the capture at the present after that. This is done specifically for the case where you have many windows presenting at different frequencies with different workloads and you want to be sure you capture the one you're interested in. There's a shortcut to cycle between all windows choosing which one is active.

For that reason, we can't limit keyboard input only to the active window. If the active window isn't visible currently or isn't presenting we need to be able to cycle away from it with keyboard input to one of the other windows. So we do need to register a set of windows and whitelist input from any of them.

So I think what needs more investigation is why AddInputWindow is not being called consistently for you from where it currently sits in WrappedOpenGL::ActivateContext. It's obviously not ideal that the windows aren't removed again, but that can be treated separately - once we have the right calls to AddInputWindow to populate the list, we can figure out when and where we can call RemoveInputWindow to remove items from the list.

Let me know if you need any more info about any of that. I can talk your ear off about reasoning but I don't want to side-track and ramble about something unimportant for ages.

baldurk commented 5 years ago

I've added experimental support in that commit. I was barely able to get Wayland working as it still seems like it's quite immature, but in a simple test I could capture from GL, GLES and Vulkan on Wayland with that commit. The cmake option is off by default so you need to build with it enabled explicitly.

Replaying on Wayland is still not working as I think there are a bunch of problems to solve there, but since it seems like xcb works fine on Wayland this doesn't seem important.

Any further testing or improvements are still welcome but note that this should still be considered unsupported for the time being, hence I'm leaving this issue open.

ppaalanen commented 4 years ago

Hi,

I quickly read through the this issue and I'd like clarify or confirm a few things.

It's not so much that Wayland uses EGL, but EGL is the only glue API for any OpenGL flavour that supports Wayland. Hence all Wayland applications use EGL for getting at OpenGL. The flavour of OpenGL is not limited in any way. You can use whatever you want: desktop OpenGL both compat and core profiles, any version; OpenGL ES 1, 2, 3 whatever. EGL implementations might vary, but Mesa supports everything.

What does EGL do in the Wayland stack is still a good reference. The major difference today is that wl_drm has been changed to zwp_linux_dmabuf_v1 Wayland protocol interface, but otherwise it is still quite accurate. wl_egl_window_create() is an important detail. AFAIK, Mesa still supports the exact way described in the post.

Detecting the underlying native window system object type from EGLNativeDisplayType and EGLNativeWindowType is very tricky and not always even possible. Modern EGL headers define these two types as generic void * and uintptr_t. Hence EGL_EXT_platform_base was created. See also EGL_KHR_platform_wayland. If you really have to autodetect the EGLNativeDisplayType, copy the code from Mesa - that is the best known implementation. Once you know the platform, the window type is fixed, at least on Wayland.

Knowing whether to connect to a Wayland or X11 display is actually a hard problem, see https://gitlab.gnome.org/GNOME/gtk/issues/1741 .

Wayland is just a set of protocol specifications and a pair of small IPC libraries really. When people complain that Wayland doesn't work, the reason in almost all cases is in the implementations: app toolkits, each Wayland server/compositor, EGL implementation, etc.

Libwayland IPC API was first decided to be callback function based. Thread-safety was demanded for the client side later, which explains some not-so-straightforward API design. Libwayland-client does not run a hidden event loop internally, it still depends on the caller to poll() the connection fd, and call entrypoints that will queue and/or dispatch Wayland events to the callback functions. Hence the time and context of dispatch can be carefully controlled. When a process comprises of multiple components that share a Wayland connection, each additional component (and thread!) should create its own wl_event_queue and assign objects to that. Doing so will allow the component to decide when it will dispatch its own events, and you can also poll/read to the connection fd at your own time.

On Wayland, all objects you can access are restricted to those that you created on that very Wayland connection. Wayland IPC simply has no way to reference an object on a different connection. (People can and have written specific Wayland extensions to export global handles and then refer to them, but that is in no way generic, and requires explicit use from all parties.) Therefore it is very important to do everything on the same Wayland connection (wl_display of libwayland-client). There is no way to even attempt to get e.g. input for another connection's window. You cannot post content to a window from a different connection, and so on.

A Wayland connection cannot be shared between processes, so everything touching the connection must always be in-process.

OTOH, a process can use as many Wayland connections as it wants, that will work fine. They will just be all isolated from each other. Passing a Wayland object from a wrong connection will probably result in an abort or a fatal protocol error, sooner or later.

Input handling on a connection shared between multiple components/libraries/parties has a catch. When you subscribe to e.g. keyboard events, you do so without specifying a window. The keyboard protocol interface has "focus" events that tell you which window the key events are targeting. On a shared connection, multiple components can listen for input events on the same window, they both see the same events. The catch is, the focus change handling code must be prepared to deal with "unknown" windows - a component must be able to detect that input focus is shifting into a window it did not create. If the code blindly assumes that all windows are its own, it will dereference the user data pointer expecting it to point to something it is not. More information is in Add API to tag proxy objects. This may not be a new problem you to you since you are already looking at windows created by other components, but I thought to mention it.

To recap from above, subscribing to input events will deliver you only events for your own windows (those that were created on this Wayland connection).

I hope this gives a little more insight. I'm happy to answer further questions about Wayland, but I might not have time to review patches.

baldurk commented 4 years ago

I think what you're talked about has already been addressed in the commit above. I set up something for keyboard input which at least works in my extremely limited testing, and capturing via normal methods and the in-application API works. As mentioned above, using wayland at replay time currently doesn't work at all.

There may be other bugs I don't know about since I was barely able to get wayland working so the testing I did was very rudimentary. It will remain disabled by default and I have no plans to come back to this support, so this issue will stay open in case someone wants to fix the remaining bugs, test it, and support it.

okias commented 2 years ago

@baldurk few years passed, Fedora using Wayland by default for some time, Ubuntu 22.04 default to Wayland too.

Is there chance you can re-try with Wayland and reconsider? Thank you

eszlari commented 1 year ago

As a prerequisite I think it would make sense to first port to Qt6, which includes some important Wayland fixes.

okias commented 1 year ago

apitrace can run on both Qt5 and Qt6 and works fine with Qt5 + Wayland. I mean, I assume Linux isn't 1st tier, but still having to do

WAYLAND_DISPLAY= QT_QPA_PLATFORM=xcb qrenderdoc

isn't very convenient.

HildarTheDorf commented 6 months ago

"Debugging wayland applcations" and "running qrenderdoc on wayland" seem to be orthogonal issues but are both gated behind the same feature. Running qrenderdoc with QT_QPA_PLATFORM=xcb and debugging a wayland application seems to work fine, while running qrenderdoc itself on wayland is indeed a buggy mess.

Could these features be separated, so we have ENABLE_WAYLAND as a separate feature to ENABLE_UNSUPPORTED_EXPERIMENTAL_POSSIBLY_BROKEN_QT_WAYLAND_PLATFORM?

DistortedDragon1o4 commented 4 months ago

Debugging does not work on wayland when using GLFW 3.4 and glad 2.0.6. Renderdoc says "Unknown window" in the overlay.

Voresh commented 4 months ago

@DistortedDragon1o4 I had the same issue - you can try to use -D GLFW_BUILD_X11=1 -D GLFW_BUILD_WAYLAND=0 CMake options - this worked for me. https://www.glfw.org/docs/latest/compile_guide.html Then also launch renderdoc with QT_QPA_PLATFORM=xcb environment variable.