Access the GPU without going through an X server

dcommander commented 9 years ago

There are spurious rumors that this either already is possible or will be possible soon with the nVidia drivers, by using EGL, but it is unclear exactly how (the Kronos EGL headers still seem to indicate that Xlib is required when using EGL on Un*x.) As soon as it is possible to do this, it would be a great enhancement for VirtualGL, since it would eliminate the need for a running X server on the server machine. I already know basically how to make such a system work in VirtualGL, because Sun used to have a proprietary API (GLP) that allowed us to accomplish the same thing on SPARC. Even as early as 2007, we identified EGL as a possible replacement for GLP, but Linux driver support was only available for it recently, and even where it is available, EGL still seems to be tied to X11 on Un*x systems. It is assumed that, eventually, that will have to change in order to support Wayland.

tonyhb commented 9 years ago

This would be awesome

dcommander commented 9 years ago

This functionality is indeed available in the latest nVidia driver, but I don't have it fully working yet. I can access the GPU device through EGL without an X server, create a Pbuffer, and (seemingly) render something to it, but I can't make glReadPixels() function properly, and I'm a little fuzzy on how double buffering and stereo can be implemented, as it seems like EGL doesn't support double buffered or stereo Pbuffers. Emulating double buffering and stereo using multiple single-buffered Pbuffers is certainly possible, but it would greatly increase the complexity of VirtualGL. Waiting for feedback from nVidia.

dcommander commented 8 years ago

After discussing at length with nVidia, it appears that there are a couple of issues blocking this:

ISSUE:

Inability to select a specific GPU device using a device name
Currently, you can enumerate all of the GPUs in the system (and obtain an EGLDevice handle for each one) by using the EGL_EXT_device_enumeration extension, and you can query each GPU by using the EGL_EXT_device_query extension, and you can obtain an EGLDisplay from one of those EGLDevices by using the EGL_EXT_platform_base and EGL_EXT_platform_device extensions. However, there is no way to obtain any sort of system-wide device name that is unique to a particular EGLDevice. VirtualGL would use this device name in the VGL_DISPLAY environment variable. The idea is that, if EGL is available, VirtualGL will use it by default, and the default value of VGL_DISPLAY will be "{EGL device name for GPU device 0}". Otherwise, if EGL isn't available, then GLX will be the default, and the default value of VGL_DISPLAY will be ":0.0". If the user specifies a particular X display as VGL_DISPLAY, then this would have the effect of forcing the use of GLX mode, even if EGL mode is supported. And setting VGL_DISPLAY to a particular EGL device name would cause all of the 3D rendering to occur on that device.

SOLUTION:

nVidia is in the process of implementing the EGL_EXT_device_drm extension, which would allow for selecting a device based on its DRM filename (/dev//dri/card/{n}), and since this is a cross-platform extension, it would allow for selecting non-nVidia GPUs as well.

ISSUE:

Lack of support for double-buffered and stereo Pbuffers
EGL currently supports double buffering only for Window drawables. Pbuffers and Pixmaps are single-buffered. This is a problem for VirtualGL, because as it is currently implemented, VirtualGL assumes a 1:1 mapping between windows on the 2D X server and Pbuffers on the 3D X server. It relies on this 1:1 mapping, because it allows VGL to easily create Pbuffers with visual attributes that match the attributes that the application requests for a particular window. The most straightforward path to EGL support would be if VirtualGL could create EGLSurfaces that inherently support double buffering and quad buffering, so VGL could pass through SwapBuffers() and glDrawBuffer()/glReadBuffer() calls to the EGLSurface corresponding to a particular window. It doesn't appear that this is ever going to be possible with EGL, however.

POSSIBLE SOLUTIONS:

Emulating double buffering and stereo using multiple Pbuffers
Minimally extremely difficult and very prone to error. The problem is that this would require that VirtualGL swap contexts behind the scenes, whenever the application switches the drawing or read buffer. There would no longer be a 1:1 correspondence between OpenGL contexts maintained by VIrtualGL and GLXContext handles that VirtualGL returns to the application. At first glance, it might seem possible to maintain some sort of internal structure in VirtualGL (VGLContext, for instance), whereby each instance of the structure contains the context handles for all of the Pbuffers in question, and a pointer to this structure could be passed back to the application. Normally, VirtualGL just passes back the GLXContext handle from the 3D X server to the application when the application calls glXCreate[New]Context(), so the application is storing the GLXContext on behalf of VirtualGL. Conceivably, VirtualGL could pass back an opaque VGLContext handle to the application instead. My spidey sense tells me that I've already been down that road, and there were some issues associated with that, but I can't remember what they were (these were things that I dealt with very early on in the development of VirtualGL, so perhaps the issues no longer exist.) That aside, however, the show-stopping issue with this approach is whether VirtualGL could straightforwardly copy the relevant properties of one Pbuffer context to the other Pbuffer context whenever the application changes the drawing/read buffer. There doesn't seem to be an EGL equivalent of glXCopyContext(). We would ideally want to limit the touch points of this feature so as to minimize the risk of regression to the existing GLX-based implementation, and changing the internal meaning of GLXContext concerns me. I'd be much more comfortable simply passing an EGLContext back to the application when it requests a GLXContext. Furthermore, this proposal would no longer have a 1:1 mapping between Window handles on the 2D X server and GLXDrawable handles on the 3D X server, although I think it might be possible to maintain a mapping of multiple EGLSurfaces to one Window by using the existing VirtualDrawable::OGLDrawable class.
Emulating double buffering and stereo using FBOs
This would allow a single EGLContext to be mapped to a single GLXContext, which eliminates some of the concerns above. We could maintain a 1:1 mapping between Window handles on the 2D X server and GLXDrawable handles on the 3D X server by using a "dummy" 1x1 Pbuffer and attaching FBOs to it. All of that could be abstracted within the existing VGLDrawable::OGLDrawable class. Swapping buffers could be implemented simply by swapping the FBO attachments. There are issues with this approach, however:
- Emulation of aux and accumulation buffers. Aux buffers would be easy to emulate using FBOs, since they act as completely independent draw/read buffers, and the application is tasked with transferring pixels into and out of them. Accum buffers would be more difficult. They would require that VirtualGL emulate the glAccum() function, which is likely to be tricky at best. Aux and accum buffers are now an obsolete feature (as of OpenGL 3.1), so it might make sense to say "if you want those features, you have to use the legacy mode-- i.e. GLX mode-- in VirtualGL."
- Handling stenciling, depth buffers, and other standard OpenGL visual features. These would all have to be maintained as separate FBOs, all attached to the drawable on the 3D X server. The FBOs would have to be created to match the properties of a particular EGLFBConfig (primarily alpha channel, depth bits, and stencil bits.) This isn't impossible, but it does add complexity to the solution, since VirtualGL would have to remember which attributes were used to create a particular EGLFBConfig (within the body of glXChooseVisual() or glXChooseFBConfig()), so that it could create and attach the FBOs appropriately within the body of glXMake[Context]Current().
- How to handle applications that already do FBO rendering on their own. Care would have to be taken to ensure that the application's FBO attachments do not conflict with the attachments that VirtualGL is making behind the scenes. This would probably require that VirtualGL interpose the functions related to FBO creation and binding so that it could avoid using attachments that the application requests. There is a high likelihood of introducing application-specific compatibility issues here.
- How to handle the fact that FBOs do not persist across contexts. Although not the common case, there certainly are applications that use multiple OpenGL contexts with the same drawable. VirtualGL would have to detect this case within glXMake[Context]Current() and transfer pixels between the FBOs used by the old context and the FBOs used by the new context, if the two contexts are bound to the same drawable.
Emulating double buffering and stereo using the EGL multiview extension
There exist two extensions (EGL_EXT_multiview_window and EXT_multiview_draw_buffer) that, at least architecturally, would seem to solve the FBO issues described above, since multi-view buffers are attached to a particular drawable, not a particular context, and the multi-view buffers inherent the default visual attributes of the drawable. However, it would be necessary for the driver vendor to implement an equivalent EGL_EXT_multiview_pbuffer extension to allow multiview with Pbuffers and to implement EXT_multiview_draw_buffer using full OpenGL (as opposed to GLES.) I see this approach as being the most straightforward and as providing the lowest possibility of regression, but there is not a good sense that the driver vendors will be willing to implement the necessary extensions just to support us.

dcommander commented 8 years ago

Simple program to demonstrate OpenGL rendering without an X server: git clone https://gist.github.com/dcommander/ee1247362201552b2532

dcommander commented 6 years ago

Popping the stack on this old thread, because I've started re-investigating how best to accomplish this, and I've been tinkering with some code over the past few days to explore what's now possible, since it's been two years since I last visited it. AFAICT (awaiting nVidia's confirmation), the situation is still the same with respect to EGL, which is that multi-view Pbuffers don't exist. That leaves us with the quandary of how to emulate these GLX features:

Double buffering. The lack of a multi-view Pbuffer EGL extension would require that we:
1. emulate Pbuffers using FBOs, since double-buffered Pbuffers wouldn't exist. Currently VirtualGL emulates OpenGL windows using Pbuffers, but in the new implementation, it would have to emulate Pbuffers as well. We could probably still create a 1x1 dummy Pbuffer for each OpenGL window, which would at least allow us to maintain the 1:1 relationship between Drawable handles on the 2D X server and GLXDrawable handles on the 3D X server (or EGLSurfaces), but the actual structure of the emulated Pbuffer would be implemented with a "Drawable FBO" (and appropriate RBO attachments to emulate the back, stencil, and depth buffers.) This is problematic, since we'd be attempting to map a lower-level OpenGL feature to a higher-level GLX feature. The requirements would include, but would probably not be limited to:
  1. Interposing glReadBuffer(), glDrawBuffer(), glDrawBuffers(), glNamedFramebufferReadBuffer(), and glNamedFramebufferDrawBuffer() (VGL already interposes glDrawBuffer()) and redirecting GL_FRONT, GL_BACK, GL_FRONT_AND_BACK, etc. to the appropriate GL_COLOR_ATTACHMENTx target (in the case of GL_FRONT_AND_BACK, this would require calling down to glDrawBuffers().) Fortunately it appears as if it is an error to call glDrawBuffer() or glReadBuffer() with a target of GL_BACK/GL_FRONT/etc. whenever an FBO other than 0 is bound, so VirtualGL can similarly trigger an OpenGL error if those targets are used without the Drawable FBO being bound.
  2. Interposing glBindFramebuffer() in order to redirect Buffer 0 to the Drawable FBO.
  3. Interposing glGet*() in order to return values for GL_DOUBLEBUFFER, GL_DRAW_BUFFER, GL_DRAW_BUFFERi, GL_DRAW_FRAMEBUFFER_BINDING, GL_READ_FRAMEBUFFER_BINDING, GL_READ_BUFFER, and GL_RENDERBUFFER_BINDING that make sense from the application's point of view.
2. emulate GLXFBConfigs somehow, since the GLXFBConfig or EGLConfig of the emulated Pbuffer would not represent its visual properties necessarily. This would likely require that VGL maintain a central table of internal FB configs; perform its own sorting algorithms within the body of glXChooseVisual(), glXChooseFBConfig(), and similar functions; and return its own internal structure pointers to the application when the application requests a GLXFBConfig. This is feasible, but it's difficult and fraught with potential compatibility issues.
3. emulate GLX_PRESERVED_CONTENTS (Hopefully we don't need to? Otherwise, I have no clue), GLX_MAX_PBUFFER_WIDTH and GLX_MAX_PBUFFER_HEIGHT (could map to GL_MAX_FRAMEBUFFER_WIDTH and GL_MAX_FRAMEBUFFER_HEIGHT), and GLX_LARGEST_PBUFFER.
Quad-buffered stereo. If we have to use FBOs to emulate double-buffered Pbuffers, then this would be an easy addition, Otherwise, I don't mind relegating this feature to the GLX back end only. I'm trying to figure out the industry direction on stereographic 3D rendering in general, because at the moment, it doesn't even appear possible to use quad-buffered stereo in OpenGL without using GLX. Furthermore, the only VGL configuration that supports quad-buffered stereo is the VGL Transport with a Linux client that has stereo capabilities. That configuration is useful for accessing visualization supercomputers remotely across a LAN, so it's definitely something I want to continue supporting, but it doesn't necessarily need to be supported with X-server-less GPU access.
Aux. buffers. If we have to use FBOs to emulate double-buffered Pbuffers, then this would be an easy addition. Otherwise, this feature won't be available with X-server-less GPU access (aux. buffers were obsoleted in OpenGL 3.1 anyhow.)
Accumulation buffers. These can't be emulated with FBOs, so if we have to use FBOs to emulate Pbuffers, then support for accumulation buffers will simply not exist in VirtualGL anymore. Accumulation buffers were also obsoleted in OpenGL 3.1, but why do I have a sinking feeling that there are still some commercial applications out there that use them? I guess such applications would have to be stuck on VGL 2.5.x if the use of FBOs proves necessary.
Floating point pixels and other esoteric Pbuffer configurations that the nVidia drivers support. No idea even where to begin emulating such things using FBOs.
Texture-from-pixmap. There appears to be an EGL extension for this, but no idea whether it supports desktop OpenGL or just OpenGL ES.
Buffer swapping. If we have to emulate Pbuffers using FBOs, hopefully we can get away with simply swapping the color attachments.

Features that will likely have to be relegated to the legacy GLX back end only:

glXSelectEvent(). If we have to use FBOs to emulate Pbuffers, then I'm not sure how to emulate this at all. (Bueller? Bueller?)
GLX_EXT_import_context and indirect contexts in general. EGL has no concept of indirect contexts.
GLX_NV_swap_group. If we have to use FBOs to emulate Pbuffers, then this extension may not be possible to emulate at all.

As you can see, this is already a potential compatibility minefield. It at least becomes a manageable minefield if we are able to retain the existing GLX Pbuffer back end and simply add an EGL Pbuffer back end to it (i.e. if a multi-view EGL Pbuffer extension is available.) That would leave open the possibility of reverting to the GLX Pbuffer back end if certain applications don't work with the EGL Pbuffer back end. However, since I can think of no sane way to use FBOs for the EGL back end without also using them for the GLX back end, if we're forced to use FBOs, essentially everything we currently know about VirtualGL's compatibility with commercial applications would have to be thrown out the window. Emulating Pbuffers with FBOs is so potentially disruptive to application compatibility that I would even entertain the notion of introducing a new interposer library just for the EGL back end, and retaining the existing interposers until the new back end can be shown to be as compatible (these new interposers could be selected in vglrun based on the value of VGL_DISPLAY.)

Maybe I'm being too paranoid, but in the 13 years I've been maintaining this project, I've literally seen every inadvisable thing that an application can possibly do with OpenGL or GLX. A lot of commercial OpenGL ISVs seem to have the philosophy that, as long as their application works on the specific platforms they support, it doesn't matter if the code is brittle, non-future-proof, or if it only works by accident because the display is local and the GPU is fast. Hence my general desire to not introduce potential compatibility problems into VirtualGL. The more we try to interpose the OpenGL API, the more problems we will potentially encounter, since that API changes a lot more frequently than GLX. There is unfortunately no inexpensive way to test a GLX/OpenGL implementation for conformance problems (accessing the Khronos comformance suites requires a $30,000 fee), and whereas some of the companies reselling VirtualGL in their own products have access to a variety of commercial applications for testing, I have no such access personally.

dcommander commented 6 years ago

Relabeling as "funding needed", since there is no way to pay for this project with the General Fund unless a multi-view Pbuffer extension for EGL materializes.

nimbixler commented 6 years ago

I'm thinking about funding this specific project. How do I do that? I'm happy to discuss offline, including the specifics around amount needed, etc. No corporate agenda other than interest in this feature and willingness to fund it (the OpenGL offload without X server). Thanks! Leo Reiter CTO, Nimbix, Inc.

dcommander commented 6 years ago

@nimbixler please contact me offline: https://virtualgl.org/About/Contact. At the moment, it doesn't appear that nVidia is going to be able to come up with a multibuffer EGL extension, so this project is definitely doable but is likely to be costly. However, I really do think it's going to be necessary in order to move VGL forward, and this year would be a perfect time to do it.

dcommander commented 6 years ago

Pushed to a later release of VirtualGL, since 2.6 beta will land this month and there is no funding currently secured for this project.

dcommander commented 5 years ago

Re-tagging as "funding needed." I've completed the groundwork (Phase 1), which is now in the dev branch (with relevant changes that affect the stable branch placed in master.) However, due to budgetary constraints with the primary company that is sponsoring this, it appears that I'm going to need to split cost on the project across multiple companies in order to make it land in 2019.

Phase 1

Cleaning up some issues in the 2.6.x stable code in order to provide a solid baseline against which to test for regressions (using servertest, which invokes frameut and fakerut with various permutations of VirtualGL settings)
- Overhauling the autotest mechanism that fakerut uses to communicate with the faker. The faker was previously sending back autotest information to fakerut using the environment, but that is not thread-safe, and it was causing sporadic crashes in fakerut's multithreaded test. The faker now exposes special functions that fakerut can load via dlsym() to obtain the autotest data, and the faker now stores that data internally using thread-local variables.
- Fixing minor issues flagged by valgrind in the faker and unit tests
- Fixing a deadlock in frameut that sometimes occurred when a particular test completed
- Making the visual-to-FB config matching tests in fakerut generally more robust across different OpenGL stacks (I am personally able to test against the nVidia proprietary driver, the old fglrx/Catalyst AMD proprietary driver, and the VMWare open source driver.)
- Fixing issues that caused fakerut to fail with the fglrx driver (basically legitimate oversights in the faker and fakerut code that the nVidia driver allowed but the fglrx driver didn't)
- Adding an option to fakerut to work around an issue whereby the fglrx driver creates all Pixmaps as single-buffered despite claiming that double-buffered Pixmap-friendly FB configs are available
- Fixing minor issues that prevented fakerut and other unit tests from completing successfully when run on a 2D X server screen other than 0
Eliminating DisplayHash and VisualHash and attaching, using the XExtData mechanism,
- the display's excluded status to the Display structure for a given 2D X server connection
- the visual's most recent matching FB config to the Visual structure for a given 2D X server connection and visual
- the visual attribute table (from glxvisual.cpp) to the Screen structure for a given 2D X server connection and screen
Removing support for transparent overlay visuals. Refer to the change log and Git log for the complete explanation, but in a nutshell, that feature had long since passed the point of obsolescence and uselessness, and it needed to go in order to make room for this feature. Removing transparent overlay visual support eliminated ReverseConfigHash.
Eliminating ConfigHash and creating a FB config attribute table similar to the aforementioned visual attribute table. This table maps GLXFBConfig ids on the 3D X server to matching 2D X server visuals. As with the visual attribute table, it is attached to the Screen structure for a given 2D X server connection and screen, and it is created on first use. This table will be necessary for implementing an FBO back end, since FBOs will necessitate breaking the current 1:1 relationship between GLXFBConfigs used internally by VirtualGL and GLXFBConfigs returned to the application. This also sped up the visual-to-FB config matching tests in fakerut

Phase 2

Implementing the EGL back end

Adding a new mode of operation to the faker that is triggered by specifying a DRM device (e.g. /dev/dri/card0) in VGL_DISPLAY rather than an X display, or by setting VGL_DISPLAY to egl
Modifying vglserver_config so that it can be used to configure only the EGL mode of operation, for those who would rather not use a 3D X server
- I think (but am not sure) that the steps that vglserver_config already takes in order to modify the framebuffer device permissions will apply to EGL
Abstracting the implementation of multi-buffer visual features that EGL Pbuffers don't support-- namely double buffering and stereo-- by attaching FBOs to the EGL Pbuffers
- Aux. and accumulation buffers (both of which are obsolete and which were also removed in OpenGL 3.1 ten years ago) will not be supported in EGL mode.
- Hopefully most of this can be encapsulated within VirtualDrawable::OGLDrawable, but there will be some other touch points, including interposing some new OpenGL functions in order to prevent applications from clobbering our FBOs and to use the FBO attachments when applications want to render to the back or right buffers.
Extending the faker's loader/wrapper for "real" OpenGL/GLX functions to handle libEGL functions.
Abstracting, where necessary, the back-end calls that the faker makes to "real" GLX functions by emulating the same functionality using EGL.
Documenting the EGL mode of operation
Lots of testing

dcommander commented 5 years ago

@nimbixler did you get my e-mail? We could use any funding help you can muster on this.

al3x609 commented 5 years ago

This is amazing first step, openGL direct rendering without Xserver is a essential feature for HPC world. Let met explain me, Im working with virtualGL/turboVNC/noVNC for deploy a remote visualization service in a cluster HPC across a single node for remote viz, because the other nodes are used for compute mode with Cuda and another tools, What those mean?

If we need to run a Xorg instance for remote viz in a GPU process, this GPU can not be shared for compute mode and X windows system, (the user should be aware of certain limitations with handling both activities simultaneously on a single GPU. If no consideration is given to managing both sets of tasks simultaneously, the system may experience disturbances and hangs in the X Window system, leading to an interruption of processing X-related tasks, such as display updates and rendering.).

Then the hpc world need a separate cluster for HPC, running X 3D server on every node for this service. This isn't good approach, the hardware requirements are very big. Share the same GPU with X windows system and GPGPU compute mode let fusion the both cluster in a single layer. Now the inSITU visualization need this approach for good performance and share resources over the cluster will minimize the costs.

the EGL remote hardware rendering and future webassembly service with h.264 coding are a good combination.

Im sorry for my poor english. :)

dcommander commented 5 years ago

I have been looking at WebAssembly in the context of designing an in-browser TurboVNC viewer. So far, it seems to be not fully baked. I've gotten as far as building libjpeg-turbo (which requires disabling its SIMD extensions, since WASM doesn't support SIMD instructions yet) and LibVNCClient into WebAssembly code and running one of the LibVNCClient examples in a browser, but the WebAssembly sockets-to-WebSockets emulation layer doesn't work properly, and the program locks up the browser.

al3x609 commented 5 years ago

There is a github project that tray to resolve this issue, simd proposal based to SIMD.JS. I Think. 🤔

dcommander commented 5 years ago

Regarding the EGL back end, I have currently expended hundreds of hours of labor attempting to make it work with FBOs because nVidia refused to implement a multi-view Pbuffer extension for EGL. I am almost to the point of having to declare failure, which will mean that I cannot seek compensation for a good chunk of that labor. Unfortunately, it just appears that renderbuffer objects and render textures cannot be shared among OpenGL contexts, and that makes it impossible to fully use those structures to emulate the features of an OpenGL window or other drawable. If anyone has any ideas, please post them. I'm desperate.

dcommander commented 5 years ago

nVidia suggested a couple of ideas:

creating an EGLImage from each renderbuffer or texture I use, since EGLImages can be shared among contexts. However, as far as I can tell, this can only be a one-way operation in desktop OpenGL, because whereas a function exists for creating an EGLImage from an RBO or texture, no such function exists for specifying that an RBO or texture in a different context should be created using storage from an existing EGLImage. I would need something similar to GL_OES_EGL_image, but for desktop OpenGL, not OpenGL ES.
using Vulkan to create shared GPU memory regions, and using those shared GPU memory regions as the backing store for render textures (or maybe RBOs.) Problems:
1. After extensive googling, I can't figure out how to do that.
2. I'll do it if I have to, but I am hesitant to introduce a Vulkan dependency in VirtualGL. I can imagine some situations in which this would introduce compatibility issues.
3. Why do I have a sinking feeling that whatever OpenGL functionality is necessary to make this work is only available in OpenGL 4? If so, then it would be a non-starter, since VirtualGL cannot impose any such requirements on the 3D application.

I'm still awaiting nVidia's response to my questions. I'm starting to lose hope, however. Most of the funding I secured for this feature was contingent upon successfully implementing it. I am currently at $13,000 worth of un-reimbursed labor on the feature, and if I can't figure out how to implement it, then I may be sunk. I don't have the ability to absorb that kind of loss right now. I normally don't engage in speculative blue-sky projects for exactly this reason, but this is also the first time I've ever encountered a hard technical roadblock like this in my 10 years of independent open source software development. I took a calculated risk that it would be possible to solve all of the problems associated with this feature, but the limitations of EGL may just make that impossible unless nVidia is willing to implement a multi-view EGL extension for Pbuffers (which, thus far, they have expressed great reluctance to do.) The other idea I initially presented in https://github.com/VirtualGL/virtualgl/issues/10#issuecomment-163030995 (using multiple Pbuffers to emulate multi-buffering) is a non-starter, since GLX allows applications to render to multiple buffers simultaneously, and that would be impossible to implement if the buffers were really drawables behind the scenes.

As I have had to implement the feature thus far, the EGL back end is already less compatible than the GLX back end, because there is no obvious way to implement:

glXCopyContext() (rarely used, but it is part of the GLX 1.0 specification)
floating point GLXFBConfigs (rarely used but still useful, particularly for certain types of visualization applications)
GLX_EXT_import_context (also rarely used, but I know of at least one commercial 3D application that uses it)
GLX_EXT_texture_from_pixmap (this is a big limitation, since this extension is used by compositing window managers)
Accumulation and aux buffers (obsolete and hopefully not used by modern 3D applications)

Some of those may be possible to implement, but I just can't spend much more time on this. I have to at least get to proof-of-concept stage before I can even get paid for most of the work I've done thus far.

If this feature proves impossible, then that doesn't necessarily mean that VirtualGL is at a technical dead end. There are still proposed enhancements to it that would be meaningful, even with a GLX back end. However, the problem is funding. I only have one source of research funding right now, and this feature has largely exhausted it. Given the seeming impossibility of implementing Vulkan support in VirtualGL (which also, BTW, caused me to lose a potential funding source), the writing is pretty much on the wall. VirtualGL will remain useful for a certain class of application, but I also think we're probably approaching the point at which it will be necessary to implement GPU-accelerated remote display in some other way-- possibly by building TurboVNC upon Xwayland, for instance, and thus implementing hardware-accelerated OpenGL directly within the X proxy. There are probably 100 technical reasons why this wouldn't work, however, and even if it would, it is likely to require hundreds of hours of labor. There's a good chance that it would go the way of this feature, i.e. that I wouldn't discover the impassable technical roadblocks until I was hundreds of hours into the project, thus requiring me to eat five figures of labor cost again. Furthermore, such a feature would have the obvious disadvantage of requiring a particular X proxy in order to achieve GPU acceleration. On the surface, that would seemingly benefit me, since it would drive more users toward TurboVNC, but if other X proxies follow suit, then ultimately it would be a net loss for The VirtualGL Project as a whole, since I would only be receiving funded development on TurboVNC and not on both TurboVNC and VirtualGL.

If nVidia's ideas don't pan out, then I don't know much else that can be done here, short of someone putting pressure on them (and/or AMD) to implement a multi-view Pbuffer extension for EGL.

dcommander commented 5 years ago

WIP checked into dev.eglbackend branch: https://github.com/VirtualGL/virtualgl/tree/dev.eglbackend

dcommander commented 5 years ago

Just found https://www.khronos.org/registry/OpenGL/extensions/EXT/EXT_EGL_image_storage.txt. Will give it a try.

dcommander commented 5 years ago

Unfortunately, GL_EXT_EGL_image_storage says that it requires OpenGL 4.2. That may be a show-stopper, since I can't impose that requirement upon OpenGL applications running with VirtualGL. Ugh. The other issue is that I don't think it will be possible to support multisampling with EGLImages, for reasons I described in this thread: https://devtalk.nvidia.com/default/topic/1056385/opengl/sharing-render-buffers-or-render-textures-among-multiple-opengl-contexts/post/5359805/#5359805

At the moment, I consider development on this to be stalled pending further ideas. I'm open to the possibility of using Vulkan if there is a straightforward way to do so, but I have no experience whatsoever with that API, and after extensive googling, I haven't been able to find the information I need regarding how to use Vulkan buffers as backing stores for textures or RBOs.

dcommander commented 5 years ago

At the moment, it's starting to appear as if using multiple single-buffered Pbuffers may be the least painful option. Although I can foresee a variety of issues that may prevent that approach from working, I can at least figure out whether it's viable with probably a day or less of work.

MadcowD commented 5 years ago

Hey @dcommander you got this!

ffeldhaus commented 5 years ago

@dcommander Did you figure out if using multiple single-buffered Pbuffers works? What is the current status? Would it be possible to create a first working version which allows to run selected applications e.g. glxspheres64? Will solving this issue also help solve #98? If it would be possible to do visualisations of AI/ML/HPC applications with docker / kubernetes without requiring X11, that would be interesting for a lot of people and may help secure further funding.

dcommander commented 5 years ago

I am still trying to secure enough funding to cover my labor to look into the single-buffered Pbuffer approach. (Thank you for the donation, BTW. That certainly does help, and 100% of that money will go toward the aforementioned labor.) I hope to be able to do that work within the next few weeks. I have no idea regarding #98. That is a separate issue, and I haven't had time to look into it. Since that feature isn't specifically funded, my labor to work on it will have to be compensated from the VirtualGL General Fund, which only covers 200 hours/year (shared with TurboVNC.) Since the General Fund is usually exhausted six months into the fiscal year, I have to prioritize its use, and #98 isn't a very high priority right now. My main priority with VirtualGL is to figure out the EGL back end, because if I can reach proof of concept, I can unlock additional funding (which will compensate a lot of the speculative labor I have done already) and testing resources.

dcommander commented 5 years ago

The single-buffered Pbuffer approach did not pan out. For a variety of reasons, it would have proven to be a nastier solution than using FBOs, mainly because there was no clean way to implement rendering to multiple buffers simultaneously. GL_FRONT_AND_BACK may not be particularly commonplace, but depending on the buffer configuration, GL_BACK, GL_FRONT, GL_LEFT, and GL_RIGHT can also render to multiple buffers. Supporting that functionality would have required a complex, error-prone, and hard-to-maintain automatic buffer synchronization mechanism.

Fortunately, I finally got the information I needed in order to figure out how to use Vulkan to create RBOs backed by non-context-specific GPU memory. I am proceeding down that path and applying for additional R&D funding.

dcommander commented 4 years ago

Status update:

Still pursuing the idea of emulating Pbuffers using RBOs backed by Vulkan memory. Will push to the dev.eglbackend branch when I have it working well enough to run GLXspheres. I haven't had a chance to put in much work on it this month due to pressing issues with my other OSS projects.

Funding update:

Total hours spent thus far: 277.6 Estimated hours remaining to productization (slightly hopeful estimate): 60-70 Total: 337.6-347.6

Hours for which funding has already been secured: 167.8 Hours for which funding can be secured upon proof of concept: 71.4 Hours for which funding has been awarded but not yet secured (legal snafu, working on it): 100 Total: 339.2

dcommander commented 4 years ago

Update: the aforementioned 100 hours of funding has finally been secured.

dcommander commented 4 years ago

Update: while the funding was finally "secured", it hasn't yet been received, so that is currently holding up further development.

dcommander commented 4 years ago

The funding was received. This is next in the queue, after some high-priority TurboVNC work that has been promoted to the head of the queue due to the sudden spike in demand for remote work solutions in the U.S.

dcommander commented 4 years ago

The Vulkan-based Pbuffer emulator is now building successfully but isn't yet running due to an issue described here: https://forums.developer.nvidia.com/t/sharing-render-buffers-or-render-textures-among-multiple-opengl-contexts/77168/27

MadcowD commented 4 years ago

So would this enable hardware-accelerated TurboVNC servers without the presence of an underlying X-server?

dcommander commented 4 years ago

@MadcowD Referring to the diagrams here, this feature would quite simply eliminate the 3D X server and replace the GLX back end (green arrow) with an EGL back end. When used with the EGL back end, VirtualGL would become a GLX emulator rather than a GLX splitter/forwarder. It's not technically accurate to describe this as a TurboVNC feature, since TurboVNC doesn't technically require VirtualGL and vice versa.

dcommander commented 4 years ago

I might have figured out how to make this work using clever manipulation of EGL context sharing. Basically, the idea is (and I've verified that this works at the low level):

VirtualGL maintains, as needed, a dedicated EGL context for every GLXFBConfig (the FB config's "RBO context".) (NOTE: with the EGL back end, GLXFBConfigs are opaque pointers to structure instances created by VirtualGL. They don't have a 1:1 correspondence with EGLConfigs. Thus, the RBO context is simply a field in the GLXFBConfig structure.) Each RBO context has a mutex and reference count associated with it. The mutex is used to ensure that any operation that modifies the RBO context handle or ref count will be thread-safe, as will any operation that occurs while the RBO context is current.
If the 3D application asks to create a new unshared GLX context using a particular GLXFBConfig, VirtualGL creates a new EGL context to emulate the GLX context, and VGL shares that new EGL context with the GLXFBConfig's RBO context (creating the latter if necessary.) 3D applications must create an unshared context before they can create a shared context, so as long as VirtualGL shares the RBO context with unshared application-created contexts, the RBO context will be shared (by proxy) with shared application-created contexts as well. Each time the RBO context is shared with an application-created context, its ref count is increased.
When a Pbuffer is created explicitly by the 3D application or implicitly by VirtualGL (in order to emulate an OpenGL window), VirtualGL creates a dummy 1x1 EGL Pbuffer surface, temporarily makes the RBO context (for the specified GLXFBConfig) current and attaches it to the dummy Pbuffer surface, then creates RBOs (in an FBO container bound to the RBO context) to emulate the visual properties of the Pbuffer.
When an application-created context is made current and a Pbuffer is attached to it explicitly by the 3D application or implicitly by VGL (in order to emulate an OpenGL window), VirtualGL attaches the corresponding dummy Pbuffer surface to the application-created context, creates an FBO container and binds it to the application-created context, and binds the virtual Pbuffer RBOs to the FBO container. (NOTE: RBOs can be shared across multiple contexts, but FBOs cannot. No matter, though, because FBOs are just containers.) The FBO container for an application-created context is deleted when another context is made current.
When the 3D application asks to destroy a GLX context, VirtualGL decreases the reference count for the context's FB config's RBO context and destroys the RBO context if the ref count is 0.
When a Pbuffer is destroyed explicitly by the 3D application or implicitly by VirtualGL (in order to emulate an OpenGL window), VirtualGL temporarily makes the RBO context (for the Pbuffer's FB config) current and attaches it to the Pbuffer's dummy EGL Pbuffer surface, deletes the RBOs and FBO being used to emulate the Pbuffer, then deletes the dummy Pbuffer surface.

I'll keep you posted regarding my progress. Fortunately, the infrastructure to test the solution above was largely already developed in the context of prior failed experiments, so hopefully I can get it prototyped within the next week or two. It's potentially messier, in terms of code, than a Vulkan-based solution would have been, but a Vulkan-based solution appears to be a non-starter because of the fact that nVidia's Vulkan implementation seems to require an X display.

dcommander commented 4 years ago

I can't seem to catch a break on this. I was making progress last week, but due to an unforeseen circumstance related to COVID-19, I have to move my office/lab over the next few days (a few weeks ahead of schedule), then I have to do my taxes for next week's deadline and fix some high-priority bugs that were just reported. I promise I'll get back to this research ASAP. I'm doing my best to keep about five balls in the air right now.

dcommander commented 4 years ago

The EGL context sharing idea is implemented and builds successfully, and GLXspheres works at the GLX level with no errors. I'm currently trying to sort out the emulation of glDrawBuffer() and glReadBuffer() so that GLXspheres will work at the OpenGL level as well (i.e. so it will actually produce an image.) I feel like I'm a few hours away from that, so hopefully I'll be able to declare a proof of concept early this coming week. The next step after getting GLXspheres to work will be getting fakerut to work, then I'll push the code and let people test the pre-release build with their applications of choice.

dcommander commented 4 years ago

GLXspheres is working! Lots of work left to do, but the concept seems to be solid.

dcommander commented 4 years ago

fakerut is passing all the way through the stereo readback heuristics tests, which means that the concept of multi-buffered Pbuffer emulation using RBOs is resoundingly proven.

dcommander commented 4 years ago

Another roadblock, unfortunately. Due to the GLX function call semantics, I was taking the approach of creating a single "RBO context" for every GLXFBConfig and sharing that RBO context with any OpenGL contexts that the 3D application requested to create with that GLXFBConfig. That allowed me to create and swap the RBOs independently of the application-requested contexts, which is necessary to properly emulate glXCreatePbuffer() and glXSwapBuffers(). Unfortunately, however, I discovered (experimentally-- I couldn't find any documentation to support this) that the RBO context has the same concurrency limitations as the application-requested contexts. That is, it can only be current in one thread at a time. Thus, I encountered a bunch of OpenGL data races when multiple threads tried to render to independent Pbuffers created with the same GLXFBConfig-- because, even though those threads had their own contexts, all of those contexts were sharing the same RBO context.

Ugh. I'm going to have to ponder how best to work around this problem. Ideas I had:

I thought of creating a separate RBO context for each Pbuffer instance. However, that's problematic, because the RBO context has to be shared with the application-requested context in the body of glXCreate*Context*(), and we don't know at that point which drawable the application-requested context will be bound to.
I thought of creating an opaque structure to represent a GLXContext when using the EGL back end and passing only that structure (metadata, basically) back to the application in glXCreate*Context*(). The application-requested context would actually be created on first use and shared with the Pbuffer-specific RBO context in the body of glXMake*Current(). However, that's also problematic, because nothing in GLX prevents an application-requested context from being bound to a completely different Pbuffer, and such would require me to somehow unshare the context with one Pbuffer's RBO context and re-share it with another Pbuffer's RBO context.

This strikes at the heart of the problem of how to emulate a non-context-specific construct using context-specific constructs. I'm going to have to either limit the EGL back end to single-threaded applications or return to the drawing board. Unfortunately, I'm now 40 hours over funding-- even including the funding that was preconditioned on a proof of concept (meaning that I haven't secured it yet.)

dcommander commented 4 years ago

Ignore most of the previous comment. I am sleep-deprived and forgot that shared contexts do not share the actual rendering state. Since my implementation ensures that any access or modification of the shared RBO handles is mutexed, as is any operation involving the RBO context, it seems as if my implementation is not to blame for most of the concurrency issues. I rewrote the multithreaded rendering tests in fakerut using raw EGL, with no shared contexts, and I see the same EGL data races there. I even tried using a completely different EGLDisplay for each thread, and I still see EGL data races. They appear to be unavoidable issues in nVidia's EGL implementation. Thus, I'll try to work around them as much as possible and move forward.

ffeldhaus commented 4 years ago

Can you elaborate a bit more on the impact? Will this be a showstopper or do you think you can go ahead with releasing a preview version? Also, is the implementation only working on nVidia GPUs or should it work for other GPUs as well?

dcommander commented 4 years ago

Currently only nVidia supports EGL device access. I have contacted AMD and encouraged them to support it as well.

I will still release a preview version. I'm just still experimenting to figure out how best to work around the concurrency issues.

dcommander commented 4 years ago

The worst case is that the preview version will not support multithreaded OpenGL rendering at all. I'm hoping I can find a better solution than that, though.

nimbixler commented 4 years ago

Is multithreaded OpenGL rendering a common usecase in your experience? Or is it more of an exception? Leo

On Thu, Aug 20, 2020, 17:28 DRC notifications@github.com wrote:

The worst case is that the preview version will not support multithreaded OpenGL rendering at all. I'm hoping I can find a better solution than that, though.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/VirtualGL/virtualgl/issues/10#issuecomment-677938424, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFHQZPT6IRUKOIENHQG2N3TSBWPPJANCNFSM4BUXKIZA .

dcommander commented 4 years ago

To be clear, when I say "multithreaded OpenGL rendering", I don't mean parallel rendering. I'm testing the implementation's ability to render to multiple "virtual windows" (Pbuffers) simultaneously with one OpenGL context per window and also to handle X window resize events that are initiated from a different thread than the rendering thread. I don't have a good sense of whether many applications actually do that, but those tests are mainly a measure of the stability of the implementation.

I went down this rabbit hole because the multithreading tests in fakerut were failing in sporadic ways, including:

eglMakeCurrent() sometimes returns EGL_FALSE (but annoyingly, eglGetError() returns EGL_SUCCESS when that happens, making it difficult to diagnose the failure.)
glClear() usually fails to clear one of the buffers to the correct color, which causes the rendering correctness check in TestThread::run() to fail for one or more threads.

When I refactored the multithreaded rendering tests using raw EGL and ran the tests through helgrind, I saw multiple data races in libnvidia-glsi and libEGL_nvidia, but neither of the aforementioned symptoms occurred. I was able to isolate (1) above and reproduce it consistently even with a single-threaded case, so I need to solve that problem before I can make any judgment regarding whether the fakerut issues are due to the EGL data races or something else.

Long story short: this is a quickly-evolving situation, so I'll keep you posted once I find out more.

peci1 commented 4 years ago

Is there a way to test the driver in a single-threaded application right now?

dcommander commented 4 years ago

Yes, the driver is just nVidia's standard driver (I'm using the latest-- 450.xx.) It installs the EGL libraries automatically.

If you mean the EGL back end I'm working on, no. I want it to pass fakerut before I push it for testing.

dcommander commented 4 years ago

Well, I think I at least solved (1) (with a 1-line fix.) Turns out that EGL really does not like it if you try to bind a surface to a context in one thread without unbinding it in another thread first. That was apparently the source of the cryptic eglMakeCurrent() error. Still trying to figure out (2).

dcommander commented 4 years ago

The news is better. (2) was a two-pronged bug, and I've managed to fix one prong (a bug in the mapping of external read and draw buffer IDs to RBOs in the EGL back end's emulated version of glXMakeContextCurrent().) Still investigating the other prong.

dcommander commented 4 years ago

All concurrency issues fixed! Apparently the races in nVidia's EGL implementation were innocuous. The second prong of (2) was a bug in the EGL back end's emulated version of glXSwapBuffers(). Proceeding with code cleanup and review.

dcommander commented 4 years ago

The EGL back end has been pushed to the dev branch and is now available in the dev/3.0 evolving pre-release build.

Care and feeding notes:

Although the EGL back end is not yet documented in the VirtualGL User’s Guide, the existing package installation instructions still apply.
Before using the EGL back end, it is necessary to run or re-run vglserver_config, even on existing VirtualGL servers. Select the appropriate option depending on whether you want to use both the GLX and EGL back ends or just the EGL back end.
VirtualGL can still be built without the EGL back end by passing -DVGL_EGLBACKEND=0 to CMake.
To use the EGL back end, use the VGL_DISPLAY environment variable or the -d argument to vglrun to specify a DRI device path, e.g. /dev/dri/card0.
When verbose mode is enabled (VGL_VERBOSE=1 or vglrun +v), VGL will print "Opening EGL device {device path}" to the console. This is a convenient way to verify that the EGL back end is in use. You can also temporarily stop the 3D X server if you want to be really sure.
I have only tested the EGL back end with version 450.xx of the nVidia drivers. If you encounter issues with an earlier driver revision, please let me know. According to this article, 358.xx or later is probably required.
If you encounter application-specific problems with the EGL back end, please file a separate GitHub issue for each application and include “EGL back end” and the name of the application in the issue’s title, as well as a thorough description of the failure (including any visual anomalies or error messages that occur.)

Testing I've performed

Regression test suite (servertest and glxinfotest) with the GLX back end across my array of VGL test machines (AMD Catalyst, nVidia, and VMWare drivers)
Regression test suite on my nVidia machine with the EGL back end
valgrind (--leak-check=full) and helgrind (thread safety checking), running fakerut -nocopycontext with both the GLX and EGL back ends on my nVidia machine
The GLX demos that are part of the VirtualGL build
checkstyle (VGL code style checker)
Verified the continued ability to build on Solaris and FreeBSD (the EGL back end builds on FreeBSD but has not been tested on that platform, since I can only run FreeBSD in a VM.)

Things that don't work yet:

glXCopyContext(). I think this can straightforwardly be made to work by borrowing some code from Mesa.
GLX_EXT_texture_from_pixmap-- I need to look into this one, but at first glance, it should be possible. It will probably just require some method of transferring pixels between an EGL Pbuffer surface and a Pixmap on the 2D X server.

Things that won't work:

OpenCL/OpenGL interoperability (until nVidia implements CL_EGL_DISPLAY_KHR)

Refer to the commit log for other notes.

At this point, I have spent approximately 100 hours more than there is available funding for. Many thanks to all who have donated and sponsored this feature thus far. If you have use for this feature and have not donated, please consider doing so. I am obligated to finish the feature on behalf of those who have sponsored it thus far, but I wasn't anticipating having to eat that much labor cost. That overage is due to numerous false starts, including being sent down the garden path vis-a-vis Vulkan (which couldn't work due to the fact that nVidia's implementation requires an X server) and numerous issues I encountered in the process of implementing the feature (including all of the aforementioned concurrency issues-- did I mention that emulating double-buffered and quad-buffered Pbuffers using FBOs is frickin' hard?!)

The good news is that this code is beyond proof-of-concept quality at this point. It's basically beta-quality, minus the two missing features and minus documentation.

peci1 commented 4 years ago

That's really great news!

I tested glxgears on my laptop and it worked.

On a server, I got en error though:

$ DISPLAY=:3 vglrun +v -d /dev/dri/card4 glxgears -info
[VGL] Shared memory segment ID for vglconfig: 28901396
[VGL] VirtualGL v2.6.80 64-bit (Build 20200826)
[VGL] Opening EGL device /dev/dri/card4
[VGL] WARNING: Could not set WM_DELETE_WINDOW on window 0x00200002
GL_RENDERER   = GeForce GTX 1080 Ti/PCIe/SSE2
GL_VERSION    = OpenGL ES 1.1 NVIDIA 418.74
GL_VENDOR     = NVIDIA Corporation
GL_EXTENSIONS = GL_EXT_debug_label GL_EXT_map_buffer_range GL_EXT_robustness GL_EXT_texture_compression_dxt1 GL_EXT_texture_compression_s3tc GL_EXT_texture_format_BGRA8888 GL_KHR_debug GL_EXT_memory_object GL_EXT_memory_object_fd GL_EXT_semaphore GL_EXT_semaphore_fd GL_NV_memory_attachment GL_NV_texture_compression_s3tc GL_OES_compressed_ETC1_RGB8_texture GL_EXT_compressed_ETC1_RGB8_sub_texture GL_OES_compressed_paletted_texture GL_OES_draw_texture GL_OES_EGL_image GL_OES_EGL_image_external GL_OES_EGL_sync GL_OES_element_index_uint GL_OES_extended_matrix_palette GL_OES_fbo_render_mipmap GL_OES_framebuffer_object GL_OES_matrix_get GL_OES_matrix_palette GL_OES_packed_depth_stencil GL_OES_point_size_array GL_OES_point_sprite GL_OES_rgb8_rgba8 GL_OES_read_format GL_OES_stencil8 GL_OES_texture_cube_map GL_OES_texture_npot GL_OES_vertex_half_float 
VisualID 33, 0x21
[VGL] ERROR: in readPixels--
[VGL]    346: GL_ARB_pixel_buffer_object extension not available

$ ll /dev/dri/     
total 0
drwxr-xr-x  2 root root       240 May 26 10:31 ./
drwxr-xr-x 20 root root      3940 Aug 25 12:00 ../
crw-rw----  1 root users 226,   0 May 26 10:31 card0
crw-rw----  1 root users 226,   1 May 26 10:31 card1
crw-rw----  1 root users 226,   2 May 26 10:31 card2
crw-rw----  1 root users 226,   3 May 26 10:31 card3
crw-rw----  1 root users 226,   4 May 26 10:31 card4
crw-rw----  1 root users 226,  64 May 26 10:31 controlD64
crw-rw----  1 root users 226, 128 May 26 10:31 renderD128
crw-rw----  1 root users 226, 129 May 26 10:31 renderD129
crw-rw----  1 root users 226, 130 May 26 10:31 renderD130
crw-rw----  1 root users 226, 131 May 26 10:31 renderD131

I did not run the vglserver_config on the server, though, after updating vgl. But as I looked into the commit that added EGL, I got the impression that the only thing that was added to the config script was adding write premissions to the DRI devices, which we already have set up. Is there something else that needs to be set?

VirtualGL / virtualgl

Access the GPU without going through an X server #10

Phase 1

Phase 2