GlPortal / glPortal

:video_game: Open Source teleportation based first person puzzle-platformer
http://glportal.de
Other
359 stars 109 forks source link

"make run" failing on Arch Linux #141

Closed RyanCargan closed 6 years ago

RyanCargan commented 6 years ago

System info (uname -a output):

Linux archbang 4.15.9-1-ARCH #1 SMP PREEMPT Sun Mar 11 17:54:33 UTC 2018 x86_64 GNU/Linux

Using nvidia 390.42-2 proprietary drivers

All dependencies successfully installed

Relevant console output:


V          Window  OpenGL 3.2
glportal: Couldn't find current GLX or EGL context.

make[3]: *** [source/CMakeFiles/run.dir/build.make:57: source/CMakeFiles/run] Error 1
make[2]: *** [CMakeFiles/Makefile2:1045: source/CMakeFiles/run.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:1052: source/CMakeFiles/run.dir/rule] Error 2
make: *** [Makefile:411: run] Error 2

All console output:


[ryan@archbang build]$ make run
[100%] Built target RadixEntity
[100%] Built target versionInfo
[100%] Built target json11
[100%] Built target VHACD_LIB
[100%] Built target easy_profiler
[100%] Built target RadixEngine
Scanning dependencies of target glportal
[100%] Building CXX object source/CMakeFiles/glportal.dir/renderer/UiRenderer.cpp.o
[100%] Linking CXX executable glportal
[100%] Built target glportal
D XmlScreenLoader  Screen /home/ryan/dev/git_repos/glPortal/data//screens/title.xml loaded
D XmlScreenLoader  Screen /home/ryan/dev/git_repos/glPortal/data//screens/pause.xml loaded
D XmlScreenLoader  Screen /home/ryan/dev/git_repos/glPortal/data//screens/end.xml loaded
V          Config  mouse_move bound to look_analogue with sensitivity 0.006000
V          Config  mouse_button_left bound to fire_pimary
V          Config  mouse_button_right bound to fire_secondary
V          Config  stick_left bound to move_analogue with sensitivity 1.000000 and deadzone 0.400000
V          Config  stick_right bound to look_analogue with sensitivity 0.050000 and deadzone 0.400000
V          Config  button_a bound to jump
V          Config  trigger_right bound to fire_pimary with deadzone 0.300000
V          Config  trigger_left bound to fire_secondary with deadzone 0.300000
V          Config  W bound to forward
V          Config  Up bound to forward
V          Config  S bound to back
V          Config  Down bound to back
V          Config  A bound to left
V          Config  Left bound to left
V          Config  D bound to right
V          Config  Right bound to right
V          Config  Space bound to jump
V          Config  Backspace bound to jump
V          Config  Escape bound to pause
V          Config  Q bound to quit
V          Config  start set to default bind
I  GameController  BaseGame::setup() start;
I    SoundManager  SDL Audio system initialized
W    SoundManager  SDL_mixer Init failed: OGG support not available, sound disabled
V    SoundManager  fully initialized
V          Window  Number of joysticks 0
V          Window  OpenGL 3.2
glportal: Couldn't find current GLX or EGL context.

make[3]: *** [source/CMakeFiles/run.dir/build.make:57: source/CMakeFiles/run] Error 1
make[2]: *** [CMakeFiles/Makefile2:1045: source/CMakeFiles/run.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:1052: source/CMakeFiles/run.dir/rule] Error 2
make: *** [Makefile:411: run] Error 2
hhirsch commented 6 years ago

@RyanCargan I can reproduce this. I get the same error on arch since I updated a few days ago.

RyanCargan commented 6 years ago

@hhirsch Do you remember the exact date of that arch update, and the last arch update you did before that? (Before the error?) Sorry if I'm being unhelpful, I'm a noob at this and can't make much sense of the error logs yet.

RyanCargan commented 6 years ago

GLX is referring to the OpenGL extension to the X Window System right? Is the issue something related to a change in Arch's "xorg-server" package? Its last update was 2018-01-31 06:19 UTC, commit history is here: https://git.archlinux.org/svntogit/packages.git/log/trunk?h=packages/xorg-server It's either this or a driver issue right? @hhirsch Which drivers are you using on arch?

RyanCargan commented 6 years ago

Is make run working on other platforms?

hhirsch commented 6 years ago

@RyanCargan I am using linux41-nvidia. What is puzzling me the most is that other OpenGL applications do run without problems. 0 other applications have this problem. So far I've tried SDL_GetError() in a few locations. The only error message I was able to get was about controller input so I did disable controller input in my working copy temporarily and I don't get any more error messages. However this did not solve the problem. I am still getting the same error message! I am pretty sure that @GlPortal/developers on other platforms do not have the problem. But maybe a @GlPortal/developers from another platform can answer here.

RyanCargan commented 6 years ago

@hhirsch Have you successfully compiled and run the program on Arch Linux before the update you're currently on? When was the last time it worked?

hhirsch commented 6 years ago

@RyanCargan I have last compiled and run GlPortal on 2018-03-10.

RyanCargan commented 6 years ago

@hhirsch So that rules out an xorg-server update being the issue right? Since its last update was before 2018-03-10. It looks like the nvidia packages received several updates after 2018-03-10 though: https://www.archlinux.org/packages/?q=nvidia&sort=-last_update The only commit for the repo itself that might have come after the date you mentioned (2018-03-10) was turlututututu fixing #137 and that seems unrelated to this. Would rolling back Arch's nvidia drivers to a previous date within an Arch VM help? With narrowing down likely causes at least? Or am I barking up the wrong tree here?

hhirsch commented 6 years ago

@RyanCargan If you are brave enough, go for it.

nilspin commented 6 years ago

@RyanCargan I can confirm, I get same error on both Nvidia and Intel GPUs on my laptop. So I don't think it is an xorg/device driver issue. Trouble within radix seems more likely(something like calling gl function before context is properly created) because I fixed a similar bug that made program crash on exit - some gl functions were being called after context deletion.
See if you can check for this?

hhirsch commented 6 years ago

I did find a commit that still works for me but it is very old: d119f995ed1926c258275034b67408e43655b679

I will try to bisect my way to a newer commit.

hhirsch commented 6 years ago

28b942afd16ebf8dec448056de491cb6b4994f78 also still good.

hhirsch commented 6 years ago

Hmm. Bisect does not work so well. I do get some unrelated errors and thus bisect won't give me one single commit that broke the build.

hhirsch commented 6 years ago

We can also painstakingly step through the code and see if there are gl calls before context creation.

hhirsch commented 6 years ago

There is also a driver and xorg update that I'll try immediately.

hhirsch commented 6 years ago

Update did not help. I did start to debug but was unable to find anything useful as of yet.

hhirsch commented 6 years ago

I am closing in a bit (gdb) break BaseGame::setup()

hhirsch commented 6 years ago

Going to continue some other time. ddd is not very nice.

nilspin commented 6 years ago

I started debugging it, and it is indeed confusing. After context is created in radix::Window::create() the first gl function called is glGenBuffers inside PhysicsDebugDraw::VBO - which is where the application crashes.

So I went in Window.cpp, changed context to 4.3(to use newer debug mode) and tried to print the error using radix's own gl::DebugOutput. Guess what? application again crashes when registering our callback at glMessageDebugCallback :stuck_out_tongue: .
I looked for some time, and found some libreoffice dev thread where this same error occured (there was an obscure patch where they made context current and it worked for them). We have only one context so making it current didn't make much sense but I still tried SDL_GL_MakeCurrent and of course that didn't work either.

Since GL's error reporting doesn't work I'm now thinking there is some trouble with the context creation code itself(which is what SDL handles, so I'm confused a bit there too)
SDL_GetError doesn't seem to be useful as well.

@hhirsch Let's keep this open for now, till we look for solution for some more time?

hhirsch commented 6 years ago

@nilspin Did you comment out the Physics Debug Draw? If I understand you correctly then whatever gl-call comes first after context creation will crash.

Next thing I'll do is compare master with 28b942a (renderer working on current arch/manjaro) but the difference is probably the OpenGL Version. I consider porting this renderer to Radix since it is much more performant and feature complete.

nilspin commented 6 years ago

@hhirsch After disabling physicsDebugDraw first set of gl calls occur in TextureLoader::uploadTexture . However, all texture creation/binding calls are executed properly, and crash occurs at glGenerateMipmap. Then I thought it may be a loader problem, so I tried building with -DRADIX_GL_LOADER=glad and running the build immediately threw up undefined symbol reference errors. I don't know how to use glad properly, will have to read up a bit to get it working.

BTW @hhirsch @RyanCargan what desktop environment are you guys using? and what GPUs do you usually run your programs on? I'm using gnome with wayland but the debugger on glportal binary says it is still loading GLX from libGL.so (GLX uses X, which wayland does not support- but X programs are made to work with a backward-compatibility layer). So as last resort I tried to build from a standard x11 display environment(xfce and lxde) but my display manager keeps crashing. If not using gnome, can you guys try to build from some other environment and let me know what errors you get?

nilspin commented 6 years ago

Okay, I tried to find more about wayland on my system(from here) and seems I'm running X11 instead. Since rest of my OpenGL programs are running fine I am now even less sure it is driver/middleware issue.

hhirsch commented 6 years ago

@RyanCargan We can't rule out xorg updates on the grounds that the game still ran for me on 2018-03-10 since I don't do daily updates.

hhirsch commented 6 years ago

Some more system updates today.

Does not fix the error.

hhirsch commented 6 years ago

I'll keep comparing with the old renderer. If all fails we need to package the old renderer and add it as a submodule into GlPortal.

hhirsch commented 6 years ago

07ac40e is the first good commit I was able to find. This is still before the split.

hhirsch commented 6 years ago

@nilspin @wow2006 @turlututututu @RyanCargan Can you check the Renderer at this commit 07ac40e? You'll see that it was a lot more feature complete and the performance was good. It probably used rad instead of quaternions but I'll be fine with that since the switch to quaternions never solved a problem we did have at the time. You think we should extract this renderer and make it work with current master?

turlututututu commented 6 years ago

wow nearest working commit is 3 years old, that's depressing! Now since the new renderer will not support quaternions, we will need to convert quaternions to angles (which is the subject of an issue on radix engine). Anyway, I will check it.

hhirsch commented 6 years ago

@turlututututu Used to work perfectly. Something in Arch has changed. Unable to replicate on other systems so far.

nilspin commented 6 years ago

I think I figured out the problem(but not the solution). Running apitrace I got this. Program crashes when libepoxy tries to internally call some mesa specific function. I tried switching to GLAD but it didn't work right out of the box, so we'll have to fix that as well.
I've posted this on the openGL forum, let's see what they can suggest.

turlututututu commented 6 years ago

Nice find!

wow2006 commented 6 years ago

I replace glad with GLEW and game work "just for testing". same result as @nilspin. the problem with GLAD :+1:

hhirsch commented 6 years ago

Some options: https://www.khronos.org/opengl/wiki/OpenGL_Loading_Library

nilspin commented 6 years ago

I was not able to build with GLAD (I couldn't resolve few linker errors) so I gave up on it and went back to libepoxy. The Couldn't find GLX or EGL context error occurs from within libepoxy and not internal to OpenGL. I compiled radixengine with debug versions SDL and epoxy libraries and stepping my way through those libs. It seems the context is not built correctly because of which glXGetProcAddress (function which loads function pointer from device driver) fails. Confusingly first gl function glViewport doesn't cause crash but second glGenBuffers does. Happens on both intel and nvidia.
I don't know if this is proper direction but I am now reading the gl spec and going through libepoxy src as to how context is built, but it's taking some time. I'll update when I find something.

nilspin commented 6 years ago

So the issue is within libepoxy itself, since there are other users complaining about same issue as well. anholt/libepoxy#164 is the exact problem we have. I'll again try getting GLAD to work till the bug in epoxy is fixed.

hhirsch commented 6 years ago

@nilspin fixed this in current Radix master.

hhirsch commented 6 years ago

@RyanCargan Please try again!