VirtualGL / virtualgl

Main VirtualGL repository
https://VirtualGL.org
Other
701 stars 106 forks source link

Access the GPU without going through an X server #10

Closed dcommander closed 4 years ago

dcommander commented 9 years ago

There are spurious rumors that this either already is possible or will be possible soon with the nVidia drivers, by using EGL, but it is unclear exactly how (the Kronos EGL headers still seem to indicate that Xlib is required when using EGL on Un*x.) As soon as it is possible to do this, it would be a great enhancement for VirtualGL, since it would eliminate the need for a running X server on the server machine. I already know basically how to make such a system work in VirtualGL, because Sun used to have a proprietary API (GLP) that allowed us to accomplish the same thing on SPARC. Even as early as 2007, we identified EGL as a possible replacement for GLP, but Linux driver support was only available for it recently, and even where it is available, EGL still seems to be tied to X11 on Un*x systems. It is assumed that, eventually, that will have to change in order to support Wayland.

tonyhb commented 9 years ago

This would be awesome

dcommander commented 9 years ago

This functionality is indeed available in the latest nVidia driver, but I don't have it fully working yet. I can access the GPU device through EGL without an X server, create a Pbuffer, and (seemingly) render something to it, but I can't make glReadPixels() function properly, and I'm a little fuzzy on how double buffering and stereo can be implemented, as it seems like EGL doesn't support double buffered or stereo Pbuffers. Emulating double buffering and stereo using multiple single-buffered Pbuffers is certainly possible, but it would greatly increase the complexity of VirtualGL. Waiting for feedback from nVidia.

dcommander commented 8 years ago

After discussing at length with nVidia, it appears that there are a couple of issues blocking this:

ISSUE:

SOLUTION:

ISSUE:

POSSIBLE SOLUTIONS:

dcommander commented 8 years ago

Simple program to demonstrate OpenGL rendering without an X server: git clone https://gist.github.com/dcommander/ee1247362201552b2532

dcommander commented 6 years ago

Popping the stack on this old thread, because I've started re-investigating how best to accomplish this, and I've been tinkering with some code over the past few days to explore what's now possible, since it's been two years since I last visited it. AFAICT (awaiting nVidia's confirmation), the situation is still the same with respect to EGL, which is that multi-view Pbuffers don't exist. That leaves us with the quandary of how to emulate these GLX features:

  1. Double buffering. The lack of a multi-view Pbuffer EGL extension would require that we:
    1. emulate Pbuffers using FBOs, since double-buffered Pbuffers wouldn't exist. Currently VirtualGL emulates OpenGL windows using Pbuffers, but in the new implementation, it would have to emulate Pbuffers as well. We could probably still create a 1x1 dummy Pbuffer for each OpenGL window, which would at least allow us to maintain the 1:1 relationship between Drawable handles on the 2D X server and GLXDrawable handles on the 3D X server (or EGLSurfaces), but the actual structure of the emulated Pbuffer would be implemented with a "Drawable FBO" (and appropriate RBO attachments to emulate the back, stencil, and depth buffers.) This is problematic, since we'd be attempting to map a lower-level OpenGL feature to a higher-level GLX feature. The requirements would include, but would probably not be limited to:
      1. Interposing glReadBuffer(), glDrawBuffer(), glDrawBuffers(), glNamedFramebufferReadBuffer(), and glNamedFramebufferDrawBuffer() (VGL already interposes glDrawBuffer()) and redirecting GL_FRONT, GL_BACK, GL_FRONT_AND_BACK, etc. to the appropriate GL_COLOR_ATTACHMENTx target (in the case of GL_FRONT_AND_BACK, this would require calling down to glDrawBuffers().) Fortunately it appears as if it is an error to call glDrawBuffer() or glReadBuffer() with a target of GL_BACK/GL_FRONT/etc. whenever an FBO other than 0 is bound, so VirtualGL can similarly trigger an OpenGL error if those targets are used without the Drawable FBO being bound.
      2. Interposing glBindFramebuffer() in order to redirect Buffer 0 to the Drawable FBO.
      3. Interposing glGet*() in order to return values for GL_DOUBLEBUFFER, GL_DRAW_BUFFER, GL_DRAW_BUFFERi, GL_DRAW_FRAMEBUFFER_BINDING, GL_READ_FRAMEBUFFER_BINDING, GL_READ_BUFFER, and GL_RENDERBUFFER_BINDING that make sense from the application's point of view.
    2. emulate GLXFBConfigs somehow, since the GLXFBConfig or EGLConfig of the emulated Pbuffer would not represent its visual properties necessarily. This would likely require that VGL maintain a central table of internal FB configs; perform its own sorting algorithms within the body of glXChooseVisual(), glXChooseFBConfig(), and similar functions; and return its own internal structure pointers to the application when the application requests a GLXFBConfig. This is feasible, but it's difficult and fraught with potential compatibility issues.
    3. emulate GLX_PRESERVED_CONTENTS (Hopefully we don't need to? Otherwise, I have no clue), GLX_MAX_PBUFFER_WIDTH and GLX_MAX_PBUFFER_HEIGHT (could map to GL_MAX_FRAMEBUFFER_WIDTH and GL_MAX_FRAMEBUFFER_HEIGHT), and GLX_LARGEST_PBUFFER.
  2. Quad-buffered stereo. If we have to use FBOs to emulate double-buffered Pbuffers, then this would be an easy addition, Otherwise, I don't mind relegating this feature to the GLX back end only. I'm trying to figure out the industry direction on stereographic 3D rendering in general, because at the moment, it doesn't even appear possible to use quad-buffered stereo in OpenGL without using GLX. Furthermore, the only VGL configuration that supports quad-buffered stereo is the VGL Transport with a Linux client that has stereo capabilities. That configuration is useful for accessing visualization supercomputers remotely across a LAN, so it's definitely something I want to continue supporting, but it doesn't necessarily need to be supported with X-server-less GPU access.
  3. Aux. buffers. If we have to use FBOs to emulate double-buffered Pbuffers, then this would be an easy addition. Otherwise, this feature won't be available with X-server-less GPU access (aux. buffers were obsoleted in OpenGL 3.1 anyhow.)
  4. Accumulation buffers. These can't be emulated with FBOs, so if we have to use FBOs to emulate Pbuffers, then support for accumulation buffers will simply not exist in VirtualGL anymore. Accumulation buffers were also obsoleted in OpenGL 3.1, but why do I have a sinking feeling that there are still some commercial applications out there that use them? I guess such applications would have to be stuck on VGL 2.5.x if the use of FBOs proves necessary.
  5. Floating point pixels and other esoteric Pbuffer configurations that the nVidia drivers support. No idea even where to begin emulating such things using FBOs.
  6. Texture-from-pixmap. There appears to be an EGL extension for this, but no idea whether it supports desktop OpenGL or just OpenGL ES.
  7. Buffer swapping. If we have to emulate Pbuffers using FBOs, hopefully we can get away with simply swapping the color attachments.

Features that will likely have to be relegated to the legacy GLX back end only:

As you can see, this is already a potential compatibility minefield. It at least becomes a manageable minefield if we are able to retain the existing GLX Pbuffer back end and simply add an EGL Pbuffer back end to it (i.e. if a multi-view EGL Pbuffer extension is available.) That would leave open the possibility of reverting to the GLX Pbuffer back end if certain applications don't work with the EGL Pbuffer back end. However, since I can think of no sane way to use FBOs for the EGL back end without also using them for the GLX back end, if we're forced to use FBOs, essentially everything we currently know about VirtualGL's compatibility with commercial applications would have to be thrown out the window. Emulating Pbuffers with FBOs is so potentially disruptive to application compatibility that I would even entertain the notion of introducing a new interposer library just for the EGL back end, and retaining the existing interposers until the new back end can be shown to be as compatible (these new interposers could be selected in vglrun based on the value of VGL_DISPLAY.)

Maybe I'm being too paranoid, but in the 13 years I've been maintaining this project, I've literally seen every inadvisable thing that an application can possibly do with OpenGL or GLX. A lot of commercial OpenGL ISVs seem to have the philosophy that, as long as their application works on the specific platforms they support, it doesn't matter if the code is brittle, non-future-proof, or if it only works by accident because the display is local and the GPU is fast. Hence my general desire to not introduce potential compatibility problems into VirtualGL. The more we try to interpose the OpenGL API, the more problems we will potentially encounter, since that API changes a lot more frequently than GLX. There is unfortunately no inexpensive way to test a GLX/OpenGL implementation for conformance problems (accessing the Khronos comformance suites requires a $30,000 fee), and whereas some of the companies reselling VirtualGL in their own products have access to a variety of commercial applications for testing, I have no such access personally.

dcommander commented 6 years ago

Relabeling as "funding needed", since there is no way to pay for this project with the General Fund unless a multi-view Pbuffer extension for EGL materializes.

nimbixler commented 6 years ago

I'm thinking about funding this specific project. How do I do that? I'm happy to discuss offline, including the specifics around amount needed, etc. No corporate agenda other than interest in this feature and willingness to fund it (the OpenGL offload without X server). Thanks! Leo Reiter CTO, Nimbix, Inc.

dcommander commented 6 years ago

@nimbixler please contact me offline: https://virtualgl.org/About/Contact. At the moment, it doesn't appear that nVidia is going to be able to come up with a multibuffer EGL extension, so this project is definitely doable but is likely to be costly. However, I really do think it's going to be necessary in order to move VGL forward, and this year would be a perfect time to do it.

dcommander commented 6 years ago

Pushed to a later release of VirtualGL, since 2.6 beta will land this month and there is no funding currently secured for this project.

dcommander commented 5 years ago

Re-tagging as "funding needed." I've completed the groundwork (Phase 1), which is now in the dev branch (with relevant changes that affect the stable branch placed in master.) However, due to budgetary constraints with the primary company that is sponsoring this, it appears that I'm going to need to split cost on the project across multiple companies in order to make it land in 2019.


Phase 1


Phase 2

Implementing the EGL back end

dcommander commented 5 years ago

@nimbixler did you get my e-mail? We could use any funding help you can muster on this.

al3x609 commented 5 years ago

This is amazing first step, openGL direct rendering without Xserver is a essential feature for HPC world. Let met explain me, Im working with virtualGL/turboVNC/noVNC for deploy a remote visualization service in a cluster HPC across a single node for remote viz, because the other nodes are used for compute mode with Cuda and another tools, What those mean?

If we need to run a Xorg instance for remote viz in a GPU process, this GPU can not be shared for compute mode and X windows system, (the user should be aware of certain limitations with handling both activities simultaneously on a single GPU. If no consideration is given to managing both sets of tasks simultaneously, the system may experience disturbances and hangs in the X Window system, leading to an interruption of processing X-related tasks, such as display updates and rendering.).

Then the hpc world need a separate cluster for HPC, running X 3D server on every node for this service. This isn't good approach, the hardware requirements are very big. Share the same GPU with X windows system and GPGPU compute mode let fusion the both cluster in a single layer. Now the inSITU visualization need this approach for good performance and share resources over the cluster will minimize the costs.

the EGL remote hardware rendering and future webassembly service with h.264 coding are a good combination.

Im sorry for my poor english. :)

dcommander commented 5 years ago

I have been looking at WebAssembly in the context of designing an in-browser TurboVNC viewer. So far, it seems to be not fully baked. I've gotten as far as building libjpeg-turbo (which requires disabling its SIMD extensions, since WASM doesn't support SIMD instructions yet) and LibVNCClient into WebAssembly code and running one of the LibVNCClient examples in a browser, but the WebAssembly sockets-to-WebSockets emulation layer doesn't work properly, and the program locks up the browser.

al3x609 commented 5 years ago

There is a github project that tray to resolve this issue, simd proposal based to SIMD.JS. I Think. 🤔

dcommander commented 5 years ago

Regarding the EGL back end, I have currently expended hundreds of hours of labor attempting to make it work with FBOs because nVidia refused to implement a multi-view Pbuffer extension for EGL. I am almost to the point of having to declare failure, which will mean that I cannot seek compensation for a good chunk of that labor. Unfortunately, it just appears that renderbuffer objects and render textures cannot be shared among OpenGL contexts, and that makes it impossible to fully use those structures to emulate the features of an OpenGL window or other drawable. If anyone has any ideas, please post them. I'm desperate.

dcommander commented 5 years ago

nVidia suggested a couple of ideas:

  1. creating an EGLImage from each renderbuffer or texture I use, since EGLImages can be shared among contexts. However, as far as I can tell, this can only be a one-way operation in desktop OpenGL, because whereas a function exists for creating an EGLImage from an RBO or texture, no such function exists for specifying that an RBO or texture in a different context should be created using storage from an existing EGLImage. I would need something similar to GL_OES_EGL_image, but for desktop OpenGL, not OpenGL ES.
  2. using Vulkan to create shared GPU memory regions, and using those shared GPU memory regions as the backing store for render textures (or maybe RBOs.) Problems:
    1. After extensive googling, I can't figure out how to do that.
    2. I'll do it if I have to, but I am hesitant to introduce a Vulkan dependency in VirtualGL. I can imagine some situations in which this would introduce compatibility issues.
    3. Why do I have a sinking feeling that whatever OpenGL functionality is necessary to make this work is only available in OpenGL 4? If so, then it would be a non-starter, since VirtualGL cannot impose any such requirements on the 3D application.

I'm still awaiting nVidia's response to my questions. I'm starting to lose hope, however. Most of the funding I secured for this feature was contingent upon successfully implementing it. I am currently at $13,000 worth of un-reimbursed labor on the feature, and if I can't figure out how to implement it, then I may be sunk. I don't have the ability to absorb that kind of loss right now. I normally don't engage in speculative blue-sky projects for exactly this reason, but this is also the first time I've ever encountered a hard technical roadblock like this in my 10 years of independent open source software development. I took a calculated risk that it would be possible to solve all of the problems associated with this feature, but the limitations of EGL may just make that impossible unless nVidia is willing to implement a multi-view EGL extension for Pbuffers (which, thus far, they have expressed great reluctance to do.) The other idea I initially presented in https://github.com/VirtualGL/virtualgl/issues/10#issuecomment-163030995 (using multiple Pbuffers to emulate multi-buffering) is a non-starter, since GLX allows applications to render to multiple buffers simultaneously, and that would be impossible to implement if the buffers were really drawables behind the scenes.

As I have had to implement the feature thus far, the EGL back end is already less compatible than the GLX back end, because there is no obvious way to implement:

Some of those may be possible to implement, but I just can't spend much more time on this. I have to at least get to proof-of-concept stage before I can even get paid for most of the work I've done thus far.

If this feature proves impossible, then that doesn't necessarily mean that VirtualGL is at a technical dead end. There are still proposed enhancements to it that would be meaningful, even with a GLX back end. However, the problem is funding. I only have one source of research funding right now, and this feature has largely exhausted it. Given the seeming impossibility of implementing Vulkan support in VirtualGL (which also, BTW, caused me to lose a potential funding source), the writing is pretty much on the wall. VirtualGL will remain useful for a certain class of application, but I also think we're probably approaching the point at which it will be necessary to implement GPU-accelerated remote display in some other way-- possibly by building TurboVNC upon Xwayland, for instance, and thus implementing hardware-accelerated OpenGL directly within the X proxy. There are probably 100 technical reasons why this wouldn't work, however, and even if it would, it is likely to require hundreds of hours of labor. There's a good chance that it would go the way of this feature, i.e. that I wouldn't discover the impassable technical roadblocks until I was hundreds of hours into the project, thus requiring me to eat five figures of labor cost again. Furthermore, such a feature would have the obvious disadvantage of requiring a particular X proxy in order to achieve GPU acceleration. On the surface, that would seemingly benefit me, since it would drive more users toward TurboVNC, but if other X proxies follow suit, then ultimately it would be a net loss for The VirtualGL Project as a whole, since I would only be receiving funded development on TurboVNC and not on both TurboVNC and VirtualGL.

If nVidia's ideas don't pan out, then I don't know much else that can be done here, short of someone putting pressure on them (and/or AMD) to implement a multi-view Pbuffer extension for EGL.

dcommander commented 5 years ago

WIP checked into dev.eglbackend branch: https://github.com/VirtualGL/virtualgl/tree/dev.eglbackend

dcommander commented 5 years ago

Just found https://www.khronos.org/registry/OpenGL/extensions/EXT/EXT_EGL_image_storage.txt. Will give it a try.

dcommander commented 5 years ago

Unfortunately, GL_EXT_EGL_image_storage says that it requires OpenGL 4.2. That may be a show-stopper, since I can't impose that requirement upon OpenGL applications running with VirtualGL. Ugh. The other issue is that I don't think it will be possible to support multisampling with EGLImages, for reasons I described in this thread: https://devtalk.nvidia.com/default/topic/1056385/opengl/sharing-render-buffers-or-render-textures-among-multiple-opengl-contexts/post/5359805/#5359805

At the moment, I consider development on this to be stalled pending further ideas. I'm open to the possibility of using Vulkan if there is a straightforward way to do so, but I have no experience whatsoever with that API, and after extensive googling, I haven't been able to find the information I need regarding how to use Vulkan buffers as backing stores for textures or RBOs.

dcommander commented 5 years ago

At the moment, it's starting to appear as if using multiple single-buffered Pbuffers may be the least painful option. Although I can foresee a variety of issues that may prevent that approach from working, I can at least figure out whether it's viable with probably a day or less of work.

MadcowD commented 5 years ago

Hey @dcommander you got this!

ffeldhaus commented 5 years ago

@dcommander Did you figure out if using multiple single-buffered Pbuffers works? What is the current status? Would it be possible to create a first working version which allows to run selected applications e.g. glxspheres64? Will solving this issue also help solve #98? If it would be possible to do visualisations of AI/ML/HPC applications with docker / kubernetes without requiring X11, that would be interesting for a lot of people and may help secure further funding.

dcommander commented 5 years ago

I am still trying to secure enough funding to cover my labor to look into the single-buffered Pbuffer approach. (Thank you for the donation, BTW. That certainly does help, and 100% of that money will go toward the aforementioned labor.) I hope to be able to do that work within the next few weeks. I have no idea regarding #98. That is a separate issue, and I haven't had time to look into it. Since that feature isn't specifically funded, my labor to work on it will have to be compensated from the VirtualGL General Fund, which only covers 200 hours/year (shared with TurboVNC.) Since the General Fund is usually exhausted six months into the fiscal year, I have to prioritize its use, and #98 isn't a very high priority right now. My main priority with VirtualGL is to figure out the EGL back end, because if I can reach proof of concept, I can unlock additional funding (which will compensate a lot of the speculative labor I have done already) and testing resources.

dcommander commented 5 years ago

The single-buffered Pbuffer approach did not pan out. For a variety of reasons, it would have proven to be a nastier solution than using FBOs, mainly because there was no clean way to implement rendering to multiple buffers simultaneously. GL_FRONT_AND_BACK may not be particularly commonplace, but depending on the buffer configuration, GL_BACK, GL_FRONT, GL_LEFT, and GL_RIGHT can also render to multiple buffers. Supporting that functionality would have required a complex, error-prone, and hard-to-maintain automatic buffer synchronization mechanism.

Fortunately, I finally got the information I needed in order to figure out how to use Vulkan to create RBOs backed by non-context-specific GPU memory. I am proceeding down that path and applying for additional R&D funding.

dcommander commented 4 years ago

Status update:

Still pursuing the idea of emulating Pbuffers using RBOs backed by Vulkan memory. Will push to the dev.eglbackend branch when I have it working well enough to run GLXspheres. I haven't had a chance to put in much work on it this month due to pressing issues with my other OSS projects.

Funding update:

Total hours spent thus far: 277.6 Estimated hours remaining to productization (slightly hopeful estimate): 60-70 Total: 337.6-347.6

Hours for which funding has already been secured: 167.8 Hours for which funding can be secured upon proof of concept: 71.4 Hours for which funding has been awarded but not yet secured (legal snafu, working on it): 100 Total: 339.2

dcommander commented 4 years ago

Update: the aforementioned 100 hours of funding has finally been secured.

dcommander commented 4 years ago

Update: while the funding was finally "secured", it hasn't yet been received, so that is currently holding up further development.

dcommander commented 4 years ago

The funding was received. This is next in the queue, after some high-priority TurboVNC work that has been promoted to the head of the queue due to the sudden spike in demand for remote work solutions in the U.S.

dcommander commented 4 years ago

The Vulkan-based Pbuffer emulator is now building successfully but isn't yet running due to an issue described here: https://forums.developer.nvidia.com/t/sharing-render-buffers-or-render-textures-among-multiple-opengl-contexts/77168/27

MadcowD commented 4 years ago

So would this enable hardware-accelerated TurboVNC servers without the presence of an underlying X-server?

dcommander commented 4 years ago

@MadcowD Referring to the diagrams here, this feature would quite simply eliminate the 3D X server and replace the GLX back end (green arrow) with an EGL back end. When used with the EGL back end, VirtualGL would become a GLX emulator rather than a GLX splitter/forwarder. It's not technically accurate to describe this as a TurboVNC feature, since TurboVNC doesn't technically require VirtualGL and vice versa.

dcommander commented 4 years ago

I might have figured out how to make this work using clever manipulation of EGL context sharing. Basically, the idea is (and I've verified that this works at the low level):

I'll keep you posted regarding my progress. Fortunately, the infrastructure to test the solution above was largely already developed in the context of prior failed experiments, so hopefully I can get it prototyped within the next week or two. It's potentially messier, in terms of code, than a Vulkan-based solution would have been, but a Vulkan-based solution appears to be a non-starter because of the fact that nVidia's Vulkan implementation seems to require an X display.

dcommander commented 4 years ago

I can't seem to catch a break on this. I was making progress last week, but due to an unforeseen circumstance related to COVID-19, I have to move my office/lab over the next few days (a few weeks ahead of schedule), then I have to do my taxes for next week's deadline and fix some high-priority bugs that were just reported. I promise I'll get back to this research ASAP. I'm doing my best to keep about five balls in the air right now.

dcommander commented 4 years ago

The EGL context sharing idea is implemented and builds successfully, and GLXspheres works at the GLX level with no errors. I'm currently trying to sort out the emulation of glDrawBuffer() and glReadBuffer() so that GLXspheres will work at the OpenGL level as well (i.e. so it will actually produce an image.) I feel like I'm a few hours away from that, so hopefully I'll be able to declare a proof of concept early this coming week. The next step after getting GLXspheres to work will be getting fakerut to work, then I'll push the code and let people test the pre-release build with their applications of choice.

dcommander commented 4 years ago

GLXspheres is working! Lots of work left to do, but the concept seems to be solid.

dcommander commented 4 years ago

fakerut is passing all the way through the stereo readback heuristics tests, which means that the concept of multi-buffered Pbuffer emulation using RBOs is resoundingly proven.

dcommander commented 4 years ago

Another roadblock, unfortunately. Due to the GLX function call semantics, I was taking the approach of creating a single "RBO context" for every GLXFBConfig and sharing that RBO context with any OpenGL contexts that the 3D application requested to create with that GLXFBConfig. That allowed me to create and swap the RBOs independently of the application-requested contexts, which is necessary to properly emulate glXCreatePbuffer() and glXSwapBuffers(). Unfortunately, however, I discovered (experimentally-- I couldn't find any documentation to support this) that the RBO context has the same concurrency limitations as the application-requested contexts. That is, it can only be current in one thread at a time. Thus, I encountered a bunch of OpenGL data races when multiple threads tried to render to independent Pbuffers created with the same GLXFBConfig-- because, even though those threads had their own contexts, all of those contexts were sharing the same RBO context.

Ugh. I'm going to have to ponder how best to work around this problem. Ideas I had:

  1. I thought of creating a separate RBO context for each Pbuffer instance. However, that's problematic, because the RBO context has to be shared with the application-requested context in the body of glXCreate*Context*(), and we don't know at that point which drawable the application-requested context will be bound to.
  2. I thought of creating an opaque structure to represent a GLXContext when using the EGL back end and passing only that structure (metadata, basically) back to the application in glXCreate*Context*(). The application-requested context would actually be created on first use and shared with the Pbuffer-specific RBO context in the body of glXMake*Current(). However, that's also problematic, because nothing in GLX prevents an application-requested context from being bound to a completely different Pbuffer, and such would require me to somehow unshare the context with one Pbuffer's RBO context and re-share it with another Pbuffer's RBO context.

This strikes at the heart of the problem of how to emulate a non-context-specific construct using context-specific constructs. I'm going to have to either limit the EGL back end to single-threaded applications or return to the drawing board. Unfortunately, I'm now 40 hours over funding-- even including the funding that was preconditioned on a proof of concept (meaning that I haven't secured it yet.)

dcommander commented 4 years ago

Ignore most of the previous comment. I am sleep-deprived and forgot that shared contexts do not share the actual rendering state. Since my implementation ensures that any access or modification of the shared RBO handles is mutexed, as is any operation involving the RBO context, it seems as if my implementation is not to blame for most of the concurrency issues. I rewrote the multithreaded rendering tests in fakerut using raw EGL, with no shared contexts, and I see the same EGL data races there. I even tried using a completely different EGLDisplay for each thread, and I still see EGL data races. They appear to be unavoidable issues in nVidia's EGL implementation. Thus, I'll try to work around them as much as possible and move forward.

ffeldhaus commented 4 years ago

Can you elaborate a bit more on the impact? Will this be a showstopper or do you think you can go ahead with releasing a preview version? Also, is the implementation only working on nVidia GPUs or should it work for other GPUs as well?

dcommander commented 4 years ago

Currently only nVidia supports EGL device access. I have contacted AMD and encouraged them to support it as well.

I will still release a preview version. I'm just still experimenting to figure out how best to work around the concurrency issues.

dcommander commented 4 years ago

The worst case is that the preview version will not support multithreaded OpenGL rendering at all. I'm hoping I can find a better solution than that, though.

nimbixler commented 4 years ago

Is multithreaded OpenGL rendering a common usecase in your experience? Or is it more of an exception? Leo

On Thu, Aug 20, 2020, 17:28 DRC notifications@github.com wrote:

The worst case is that the preview version will not support multithreaded OpenGL rendering at all. I'm hoping I can find a better solution than that, though.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/VirtualGL/virtualgl/issues/10#issuecomment-677938424, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFHQZPT6IRUKOIENHQG2N3TSBWPPJANCNFSM4BUXKIZA .

dcommander commented 4 years ago

To be clear, when I say "multithreaded OpenGL rendering", I don't mean parallel rendering. I'm testing the implementation's ability to render to multiple "virtual windows" (Pbuffers) simultaneously with one OpenGL context per window and also to handle X window resize events that are initiated from a different thread than the rendering thread. I don't have a good sense of whether many applications actually do that, but those tests are mainly a measure of the stability of the implementation.

I went down this rabbit hole because the multithreading tests in fakerut were failing in sporadic ways, including:

  1. eglMakeCurrent() sometimes returns EGL_FALSE (but annoyingly, eglGetError() returns EGL_SUCCESS when that happens, making it difficult to diagnose the failure.)
  2. glClear() usually fails to clear one of the buffers to the correct color, which causes the rendering correctness check in TestThread::run() to fail for one or more threads.

When I refactored the multithreaded rendering tests using raw EGL and ran the tests through helgrind, I saw multiple data races in libnvidia-glsi and libEGL_nvidia, but neither of the aforementioned symptoms occurred. I was able to isolate (1) above and reproduce it consistently even with a single-threaded case, so I need to solve that problem before I can make any judgment regarding whether the fakerut issues are due to the EGL data races or something else.

Long story short: this is a quickly-evolving situation, so I'll keep you posted once I find out more.

peci1 commented 4 years ago

Is there a way to test the driver in a single-threaded application right now?

dcommander commented 4 years ago

Yes, the driver is just nVidia's standard driver (I'm using the latest-- 450.xx.) It installs the EGL libraries automatically.

If you mean the EGL back end I'm working on, no. I want it to pass fakerut before I push it for testing.

dcommander commented 4 years ago

Well, I think I at least solved (1) (with a 1-line fix.) Turns out that EGL really does not like it if you try to bind a surface to a context in one thread without unbinding it in another thread first. That was apparently the source of the cryptic eglMakeCurrent() error. Still trying to figure out (2).

dcommander commented 4 years ago

The news is better. (2) was a two-pronged bug, and I've managed to fix one prong (a bug in the mapping of external read and draw buffer IDs to RBOs in the EGL back end's emulated version of glXMakeContextCurrent().) Still investigating the other prong.

dcommander commented 4 years ago

All concurrency issues fixed! Apparently the races in nVidia's EGL implementation were innocuous. The second prong of (2) was a bug in the EGL back end's emulated version of glXSwapBuffers(). Proceeding with code cleanup and review.

dcommander commented 4 years ago

The EGL back end has been pushed to the dev branch and is now available in the dev/3.0 evolving pre-release build.

Care and feeding notes:

Testing I've performed

Things that don't work yet:

Things that won't work:

Refer to the commit log for other notes.

At this point, I have spent approximately 100 hours more than there is available funding for. Many thanks to all who have donated and sponsored this feature thus far. If you have use for this feature and have not donated, please consider doing so. I am obligated to finish the feature on behalf of those who have sponsored it thus far, but I wasn't anticipating having to eat that much labor cost. That overage is due to numerous false starts, including being sent down the garden path vis-a-vis Vulkan (which couldn't work due to the fact that nVidia's implementation requires an X server) and numerous issues I encountered in the process of implementing the feature (including all of the aforementioned concurrency issues-- did I mention that emulating double-buffered and quad-buffered Pbuffers using FBOs is frickin' hard?!)

The good news is that this code is beyond proof-of-concept quality at this point. It's basically beta-quality, minus the two missing features and minus documentation.

peci1 commented 4 years ago

That's really great news!

I tested glxgears on my laptop and it worked.

On a server, I got en error though:

$ DISPLAY=:3 vglrun +v -d /dev/dri/card4 glxgears -info
[VGL] Shared memory segment ID for vglconfig: 28901396
[VGL] VirtualGL v2.6.80 64-bit (Build 20200826)
[VGL] Opening EGL device /dev/dri/card4
[VGL] WARNING: Could not set WM_DELETE_WINDOW on window 0x00200002
GL_RENDERER   = GeForce GTX 1080 Ti/PCIe/SSE2
GL_VERSION    = OpenGL ES 1.1 NVIDIA 418.74
GL_VENDOR     = NVIDIA Corporation
GL_EXTENSIONS = GL_EXT_debug_label GL_EXT_map_buffer_range GL_EXT_robustness GL_EXT_texture_compression_dxt1 GL_EXT_texture_compression_s3tc GL_EXT_texture_format_BGRA8888 GL_KHR_debug GL_EXT_memory_object GL_EXT_memory_object_fd GL_EXT_semaphore GL_EXT_semaphore_fd GL_NV_memory_attachment GL_NV_texture_compression_s3tc GL_OES_compressed_ETC1_RGB8_texture GL_EXT_compressed_ETC1_RGB8_sub_texture GL_OES_compressed_paletted_texture GL_OES_draw_texture GL_OES_EGL_image GL_OES_EGL_image_external GL_OES_EGL_sync GL_OES_element_index_uint GL_OES_extended_matrix_palette GL_OES_fbo_render_mipmap GL_OES_framebuffer_object GL_OES_matrix_get GL_OES_matrix_palette GL_OES_packed_depth_stencil GL_OES_point_size_array GL_OES_point_sprite GL_OES_rgb8_rgba8 GL_OES_read_format GL_OES_stencil8 GL_OES_texture_cube_map GL_OES_texture_npot GL_OES_vertex_half_float 
VisualID 33, 0x21
[VGL] ERROR: in readPixels--
[VGL]    346: GL_ARB_pixel_buffer_object extension not available

$ ll /dev/dri/     
total 0
drwxr-xr-x  2 root root       240 May 26 10:31 ./
drwxr-xr-x 20 root root      3940 Aug 25 12:00 ../
crw-rw----  1 root users 226,   0 May 26 10:31 card0
crw-rw----  1 root users 226,   1 May 26 10:31 card1
crw-rw----  1 root users 226,   2 May 26 10:31 card2
crw-rw----  1 root users 226,   3 May 26 10:31 card3
crw-rw----  1 root users 226,   4 May 26 10:31 card4
crw-rw----  1 root users 226,  64 May 26 10:31 controlD64
crw-rw----  1 root users 226, 128 May 26 10:31 renderD128
crw-rw----  1 root users 226, 129 May 26 10:31 renderD129
crw-rw----  1 root users 226, 130 May 26 10:31 renderD130
crw-rw----  1 root users 226, 131 May 26 10:31 renderD131

I did not run the vglserver_config on the server, though, after updating vgl. But as I looked into the commit that added EGL, I got the impression that the only thing that was added to the config script was adding write premissions to the DRI devices, which we already have set up. Is there something else that needs to be set?