Closed RyanCargan closed 6 years ago
@RyanCargan I can reproduce this. I get the same error on arch since I updated a few days ago.
@hhirsch Do you remember the exact date of that arch update, and the last arch update you did before that? (Before the error?) Sorry if I'm being unhelpful, I'm a noob at this and can't make much sense of the error logs yet.
GLX is referring to the OpenGL extension to the X Window System right? Is the issue something related to a change in Arch's "xorg-server" package? Its last update was 2018-01-31 06:19 UTC, commit history is here: https://git.archlinux.org/svntogit/packages.git/log/trunk?h=packages/xorg-server It's either this or a driver issue right? @hhirsch Which drivers are you using on arch?
Is make run
working on other platforms?
@RyanCargan I am using linux41-nvidia. What is puzzling me the most is that other OpenGL applications do run without problems. 0 other applications have this problem. So far I've tried SDL_GetError() in a few locations. The only error message I was able to get was about controller input so I did disable controller input in my working copy temporarily and I don't get any more error messages. However this did not solve the problem. I am still getting the same error message! I am pretty sure that @GlPortal/developers on other platforms do not have the problem. But maybe a @GlPortal/developers from another platform can answer here.
@hhirsch Have you successfully compiled and run the program on Arch Linux before the update you're currently on? When was the last time it worked?
@RyanCargan I have last compiled and run GlPortal on 2018-03-10.
@hhirsch So that rules out an xorg-server update being the issue right? Since its last update was before 2018-03-10. It looks like the nvidia packages received several updates after 2018-03-10 though: https://www.archlinux.org/packages/?q=nvidia&sort=-last_update The only commit for the repo itself that might have come after the date you mentioned (2018-03-10) was turlututututu fixing #137 and that seems unrelated to this. Would rolling back Arch's nvidia drivers to a previous date within an Arch VM help? With narrowing down likely causes at least? Or am I barking up the wrong tree here?
@RyanCargan If you are brave enough, go for it.
@RyanCargan I can confirm, I get same error on both Nvidia and Intel GPUs on my laptop. So I don't think it is an xorg/device driver issue. Trouble within radix seems more likely(something like calling gl function before context is properly created) because I fixed a similar bug that made program crash on exit - some gl functions were being called after context deletion.
See if you can check for this?
I did find a commit that still works for me but it is very old: d119f995ed1926c258275034b67408e43655b679
I will try to bisect my way to a newer commit.
28b942afd16ebf8dec448056de491cb6b4994f78 also still good.
Hmm. Bisect does not work so well. I do get some unrelated errors and thus bisect won't give me one single commit that broke the build.
We can also painstakingly step through the code and see if there are gl calls before context creation.
There is also a driver and xorg update that I'll try immediately.
Update did not help. I did start to debug but was unable to find anything useful as of yet.
I am closing in a bit
(gdb) break BaseGame::setup()
Going to continue some other time. ddd is not very nice.
I started debugging it, and it is indeed confusing. After context is created in radix::Window::create()
the first gl
function called is glGenBuffers
inside PhysicsDebugDraw::VBO
- which is where the application crashes.
So I went in Window.cpp
, changed context to 4.3
(to use newer debug mode) and tried to print the error using radix's own gl::DebugOutput
. Guess what? application again crashes when registering our callback at glMessageDebugCallback
:stuck_out_tongue: .
I looked for some time, and found some libreoffice dev thread where this same error occured (there was an obscure patch where they made context current and it worked for them). We have only one context so making it current didn't make much sense but I still tried SDL_GL_MakeCurrent
and of course that didn't work either.
Since GL's error reporting doesn't work I'm now thinking there is some trouble with the context creation code itself(which is what SDL
handles, so I'm confused a bit there too)
SDL_GetError
doesn't seem to be useful as well.
@hhirsch Let's keep this open for now, till we look for solution for some more time?
@nilspin Did you comment out the Physics Debug Draw? If I understand you correctly then whatever gl-call comes first after context creation will crash.
Next thing I'll do is compare master with 28b942a (renderer working on current arch/manjaro) but the difference is probably the OpenGL Version. I consider porting this renderer to Radix since it is much more performant and feature complete.
@hhirsch After disabling physicsDebugDraw
first set of gl calls occur in TextureLoader::uploadTexture
. However, all texture creation/binding calls are executed properly, and crash occurs at glGenerateMipmap
. Then I thought it may be a loader problem, so I tried building with -DRADIX_GL_LOADER=glad
and running the build immediately threw up undefined symbol reference errors. I don't know how to use glad
properly, will have to read up a bit to get it working.
BTW @hhirsch @RyanCargan what desktop environment are you guys using? and what GPUs do you usually run your programs on? I'm using gnome with wayland but the debugger on glportal
binary says it is still loading GLX from libGL.so
(GLX uses X, which wayland does not support- but X programs are made to work with a backward-compatibility layer). So as last resort I tried to build from a standard x11 display environment(xfce and lxde) but my display manager keeps crashing. If not using gnome, can you guys try to build from some other environment and let me know what errors you get?
Okay, I tried to find more about wayland on my system(from here) and seems I'm running X11 instead. Since rest of my OpenGL programs are running fine I am now even less sure it is driver/middleware issue.
@RyanCargan We can't rule out xorg updates on the grounds that the game still ran for me on 2018-03-10 since I don't do daily updates.
Some more system updates today.
Does not fix the error.
I'll keep comparing with the old renderer. If all fails we need to package the old renderer and add it as a submodule into GlPortal.
07ac40e is the first good commit I was able to find. This is still before the split.
@nilspin @wow2006 @turlututututu @RyanCargan Can you check the Renderer at this commit 07ac40e? You'll see that it was a lot more feature complete and the performance was good. It probably used rad instead of quaternions but I'll be fine with that since the switch to quaternions never solved a problem we did have at the time. You think we should extract this renderer and make it work with current master?
wow nearest working commit is 3 years old, that's depressing! Now since the new renderer will not support quaternions, we will need to convert quaternions to angles (which is the subject of an issue on radix engine). Anyway, I will check it.
@turlututututu Used to work perfectly. Something in Arch has changed. Unable to replicate on other systems so far.
I think I figured out the problem(but not the solution). Running apitrace I got this. Program crashes when libepoxy tries to internally call some mesa specific function. I tried switching to GLAD but it didn't work right out of the box, so we'll have to fix that as well.
I've posted this on the openGL forum, let's see what they can suggest.
Nice find!
I replace glad with GLEW and game work "just for testing". same result as @nilspin. the problem with GLAD :+1:
I was not able to build with GLAD (I couldn't resolve few linker errors) so I gave up on it and went back to libepoxy. The Couldn't find GLX or EGL context
error occurs from within libepoxy and not internal to OpenGL. I compiled radixengine with debug versions SDL and epoxy libraries and stepping my way through those libs. It seems the context is not built correctly because of which glXGetProcAddress
(function which loads function pointer from device driver) fails. Confusingly first gl function glViewport
doesn't cause crash but second glGenBuffers
does. Happens on both intel and nvidia.
I don't know if this is proper direction but I am now reading the gl spec and going through libepoxy src as to how context is built, but it's taking some time. I'll update when I find something.
So the issue is within libepoxy itself, since there are other users complaining about same issue as well. anholt/libepoxy#164 is the exact problem we have. I'll again try getting GLAD to work till the bug in epoxy is fixed.
@nilspin fixed this in current Radix master.
@RyanCargan Please try again!
System info (uname -a output):
Using nvidia 390.42-2 proprietary drivers
All dependencies successfully installed
Relevant console output:
All console output: