LinuxCNC / linuxcnc

LinuxCNC controls CNC machines. It can drive milling machines, lathes, 3d printers, laser cutters, plasma cutters, robot arms, hexapods, and more.
http://linuxcnc.org/
GNU General Public License v2.0
1.81k stars 1.16k forks source link

OpenGL problem on Bookworm #2264

Closed SebKuzminsky closed 1 year ago

SebKuzminsky commented 1 year ago

On an up-to-date install of Bookworm, with linuxcnc 2.9 built into a deb and installed, linuxcnc fails to start with this error:

LINUXCNC - 2.9.0~pre1
Machine configuration directory is '/home/seb/linuxcnc/configs/sim.axis'
Machine configuration file is 'axis.ini'
Starting LinuxCNC...
linuxcnc TPMOD=tpmod HOMEMOD=homemod EMCMOT=motmod
Note: Using POSIX non-realtime
Found file(lib): /usr/share/linuxcnc/hallib/core_sim.hal
Found file(lib): /usr/share/linuxcnc/hallib/sim_spindle_encoder.hal
Found file(lib): /usr/share/linuxcnc/hallib/axis_manualtoolchange.hal
Found file(lib): /usr/share/linuxcnc/hallib/simulated_home.hal
Found file(lib): /usr/share/linuxcnc/hallib/check_xyz_constraints.hal
link (updating variable file): No such file or directory
Traceback (most recent call last):
  File "/usr/bin/axis", line 26, in <module>
    from OpenGL.GLUT import *
  File "/usr/lib/python3/dist-packages/OpenGL/GLUT/__init__.py", line 5, in <module>
    from OpenGL.GLUT.fonts import *
  File "/usr/lib/python3/dist-packages/OpenGL/GLUT/fonts.py", line 20, in <module>
    p = platform.getGLUTFontPointer( name )
  File "/usr/lib/python3/dist-packages/OpenGL/platform/baseplatform.py", line 350, in getGLUTFontPointer
    raise NotImplementedError( 
NotImplementedError: Platform does not define a GLUT font retrieval function
Shutting down and cleaning up LinuxCNC...
task: 386 cycles, min=0.000009, max=0.006394, avg=0.001097, 0 latency excursions (> 10x expected cycle time of 0.001000s)
Note: Using POSIX non-realtime
LinuxCNC terminated with an error.  You can find more information in the log:
    /home/seb/linuxcnc_debug.txt
and
    /home/seb/linuxcnc_print.txt
as well as in the output of the shell command 'dmesg' and in the terminal

The issue has been discussed on the forum here, without resolution: https://forum.linuxcnc.org/9-installing-linuxcnc/47468-python-issues-on-bookworm

The issue is reproducible without involving LinuxCNC at all:

$ python3 -c "from OpenGL.GLUT import *"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python3/dist-packages/OpenGL/GLUT/__init__.py", line 5, in <module>
    from OpenGL.GLUT.fonts import *
  File "/usr/lib/python3/dist-packages/OpenGL/GLUT/fonts.py", line 20, in <module>
    p = platform.getGLUTFontPointer( name )
  File "/usr/lib/python3/dist-packages/OpenGL/platform/baseplatform.py", line 350, in getGLUTFontPointer
    raise NotImplementedError( 
NotImplementedError: Platform does not define a GLUT font retrieval function

From https://github.com/MPI-IS/mesh/issues/49, here's a workaround that gets past the GLUT import error, but linuxcnc still fails shortly thereafter:

$ export PYOPENGL_PLATFORM=osmesa
$ python3 -c "from OpenGL.GLUT import *"
$ linuxcnc
LINUXCNC - 2.9.0~pre1
Machine configuration directory is '/home/seb/linuxcnc/configs/sim.axis'
Machine configuration file is 'axis.ini'
Starting LinuxCNC...
linuxcnc TPMOD=tpmod HOMEMOD=homemod EMCMOT=motmod
Note: Using POSIX non-realtime
Found file(lib): /usr/share/linuxcnc/hallib/core_sim.hal
Found file(lib): /usr/share/linuxcnc/hallib/sim_spindle_encoder.hal
Found file(lib): /usr/share/linuxcnc/hallib/axis_manualtoolchange.hal
Found file(lib): /usr/share/linuxcnc/hallib/simulated_home.hal
Found file(lib): /usr/share/linuxcnc/hallib/check_xyz_constraints.hal
Traceback (most recent call last):
  File "/usr/bin/axis", line 24, in <module>
    from OpenGL.GL import *
  File "/usr/lib/python3/dist-packages/OpenGL/GL/__init__.py", line 4, in <module>
    from OpenGL.GL.VERSION.GL_1_1 import *
  File "/usr/lib/python3/dist-packages/OpenGL/GL/VERSION/GL_1_1.py", line 14, in <module>
    from OpenGL.raw.GL.VERSION.GL_1_1 import *
  File "/usr/lib/python3/dist-packages/OpenGL/raw/GL/VERSION/GL_1_1.py", line 7, in <module>
    from OpenGL.raw.GL import _errors
  File "/usr/lib/python3/dist-packages/OpenGL/raw/GL/_errors.py", line 4, in <module>
    _error_checker = _ErrorChecker( _p, _p.GL.glGetError )
AttributeError: 'NoneType' object has no attribute 'glGetError'
Shutting down and cleaning up LinuxCNC...
task: 322 cycles, min=0.000007, max=0.004341, avg=0.001073, 0 latency excursions (> 10x expected cycle time of 0.001000s)
Note: Using POSIX non-realtime
LinuxCNC terminated with an error.  You can find more information in the log:
    /home/seb/linuxcnc_debug.txt
and
    /home/seb/linuxcnc_print.txt
as well as in the output of the shell command 'dmesg' and in the terminal
SebKuzminsky commented 1 year ago

The error goes away if I switch from the default Gnome session (wayland) to the "GNOME On X11" session type, so that echo $XDG_SESSION_TYPE says x11.

This screenshot is from Buster but it works the same on Bookworm: gnome-on-x11

petterreinholdtsen commented 1 year ago

[Sebastian Kuzminsky]

The error goes away if I switch from the default Gnome session (wayland) to the "GNOME On X11" session type, so that echo $XDG_SESSION_TYPE says x11.

Sound like yet another feature not implemented by Wayland. :(

Seem to be several issues with OpenGL and Bookworm, I ran into #1599. Perhaps the OpenGL code need some refurbishing. :)

-- Happy hacking Petter Reinholdtsen

SebKuzminsky commented 1 year ago

I can also work around this problem (when running on Wayland) by setting the environment variable PYOPENGL_PLATFORM to x11 before launching linuxcnc:

$ echo $XDG_SESSION_TYPE
wayland

$ python3 -c 'from OpenGL.GLUT import *'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python3/dist-packages/OpenGL/GLUT/__init__.py", line 5, in <module>
    from OpenGL.GLUT.fonts import *
  File "/usr/lib/python3/dist-packages/OpenGL/GLUT/fonts.py", line 20, in <module>
    p = platform.getGLUTFontPointer( name )
  File "/usr/lib/python3/dist-packages/OpenGL/platform/baseplatform.py", line 350, in getGLUTFontPointer
    raise NotImplementedError( 
NotImplementedError: Platform does not define a GLUT font retrieval function

$ PYOPENGL_PLATFORM=x11 python3 -c 'from OpenGL.GLUT import *'

$ PYOPENGL_PLATFORM=x11 linuxcnc
LINUXCNC - 2.9.0~pre1
Machine configuration directory is '/home/seb/linuxcnc/configs/sim.axis'
Machine configuration file is 'axis.ini'
Starting LinuxCNC...
linuxcnc TPMOD=tpmod HOMEMOD=homemod EMCMOT=motmod
Note: Using POSIX non-realtime
Found file(lib): /home/seb/linuxcnc-hacking/linuxcnc-dev/lib/hallib/core_sim.hal
Found file(lib): /home/seb/linuxcnc-hacking/linuxcnc-dev/lib/hallib/sim_spindle_encoder.hal
Found file(lib): /home/seb/linuxcnc-hacking/linuxcnc-dev/lib/hallib/axis_manualtoolchange.hal
Found file(lib): /home/seb/linuxcnc-hacking/linuxcnc-dev/lib/hallib/simulated_home.hal
Found file(lib): /home/seb/linuxcnc-hacking/linuxcnc-dev/lib/hallib/check_xyz_constraints.hal
note: MAXV     max: 5.000 units/sec 300.000 units/min
note: LJOG     max: 5.000 units/sec 300.000 units/min
note: LJOG default: 0.250 units/sec 15.000 units/min
note: jog_order='XYZ'
note: jog_invert=set()

What does it all mean?

SebKuzminsky commented 1 year ago

This patch makes linuxcnc's OpenGL stuff work for me on Wayland on Bookworm:

diff --git a/scripts/linuxcnc.in b/scripts/linuxcnc.in
index f8d5f18471..1f2b5aa424 100644
--- a/scripts/linuxcnc.in
+++ b/scripts/linuxcnc.in
@@ -22,6 +22,8 @@ if test "xyes" = "x@RUN_IN_PLACE@"; then
     fi
 fi

+export PYOPENGL_PLATFORM=x11
+
 ################################################################################
 # 0. Values that come from configure
 ################################################################################
rodw-au commented 1 year ago

Nice to know there is a solution to this recent development. I think as they bury xorg, they are removing comptability features. Does your patch let Gmocappy work? Please check. using xfce is also a good solution as its a xorg environment.

f the environment is changed in linuxcnc, I'd like to see if my chromebook will run it again.

rodw-au commented 1 year ago

Compiled v 2.9 as RIP on my chromebook and set this environment variable export PYOPENGL_PLATFORM=x11 Chromebook runs a version of Bullseye with kernel 5.10 Axis failed with an opengl error Ran this script that is deployed with linuxcnc ~/linuxcnc-dev/lib/python/qtvcp/designer/install_script and mentioned in the docs here http://linuxcnc.org/docs/2.9/html/plasma/qtplasmac.html#qt-dependency I ran qtplasmac which is a qtpvcp config and the program opened but complained about a missing dependency Is python3-gst1.0 installed? This does not exist but found this package exists in bullseye python3-gst-1.0 so installed it THis resolved the missing dependency. I ran axis again and it opened perfectly!

So a request: Can the qtvcp dependencies be added to the list linuxcnc knows about when you run dpkg-checkbuilddeps as per the docs here http://linuxcnc.org/docs/2.9/html/code/building-linuxcnc.html#Satisfying-Build-Dependencies Surely qtvcp should be considered part of the main line of linuxcnc?

Anyway, its great I can now use my chromebook to run sims to test stuff. Its been broken (by the same issue it seems for over 12 months.

SebKuzminsky commented 1 year ago

The tip of master works on Bullseye but fails as described above on Bookworm. The important difference seems to be that Bullseye has python3-opengl 3.1.5 (which works), but Bookwork has python3-opengl 3.1.6 (which fails).

3.1.6 went into debian in mid-November, so I expect it's been broken since then.

If I install 3.1.5 from snapshots (http://snapshot.debian.org/binary/python3-opengl/) on Bookworm, Axis runs again. (I had to install it with dpkg -i --force-depends, because python3-opengl 3.1.5 Depends on freeglut3, which in Bookworm has transitioned to libglut3.12).

The important difference between python3-opengl 3.1.5 (which works) and 3.1.6 (which doesn't work) is in the detection and selection of the "platform" it uses. 3.1.5 selects the "GLX" platform, but 3.1.6 selects the "EGL" platform:

3.1.5:

3.1.6:

Just like the error message says, the EGL platform lacks the getGLUTFontPointer function: https://github.com/mcfletch/pyopengl/blob/3e9791ffb4cd4831dae261d6bea3049ce9e78f01/OpenGL/platform/egl.py

Unlike the GLX platform, which has that function: https://github.com/mcfletch/pyopengl/blob/3e9791ffb4cd4831dae261d6bea3049ce9e78f01/OpenGL/platform/glx.py#L97

After digging around for a bit, it's not totally surprising that this bug made it into pyopengl, and hasn't been noticed or fixed yet -- the pyopengl project is even more starved for developers than LinuxCNC. This is the most recent email on the pyopengl developers' mailing list: https://sourceforge.net/p/pyopengl/mailman/message/37278387/

This all makes me more willing to go with the fix/workaround in #2267 - it just restores the selection of the working GLX platform from 3.1.5.

petterreinholdtsen commented 1 year ago

[Sebastian Kuzminsky]

The tip of master works on Bullseye but fails as described above on Bookworm. The important difference seems to be that Bullseye has python3-opengl 3.1.5 (which works), but Bookwork has python3-opengl 3.1.6 (which fails).

Great to hear you have identified the relevant package and version. Is the issue registered on <URL: https://bugs.debian.org/src:pyopengl >? I could not find any obvious candidates. If not, we should report the issue there, as well as upstream.

3.1.6 went into debian in mid-November, so I expect it's been broken since then.

Would fit with my problem period, at least.

If I install 3.1.5 from snapshots (http://snapshot.debian.org/binary/python3-opengl/) on Bookworm, Axis runs again. (I had to install it with dpkg -i --force-depends, because python3-opengl 3.1.5 Depends on freeglut3, which in Bookworm has transitioned to libglut3.12).

I will test this too as soon as I can. Might take a few days.

-- Happy hacking Petter Reinholdtsen

SebKuzminsky commented 1 year ago

Reported upstream here: https://github.com/mcfletch/pyopengl/issues/89

Reported to Debian here: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1029011

andypugh commented 1 year ago

https://xkcd.com/2347/

SebKuzminsky commented 1 year ago

With @swt2c's fix in https://github.com/mcfletch/pyopengl/pull/91 I now get a little further in launching Axis:

$ echo $XDG_SESSION_TYPE
wayland

$ linuxcnc -l
LINUXCNC - 2.10.0~pre0
Machine configuration directory is '/home/seb/linuxcnc/configs/sim.axis'
Machine configuration file is 'axis.ini'
Starting LinuxCNC...
linuxcnc TPMOD=tpmod HOMEMOD=homemod EMCMOT=motmod
Note: Using POSIX non-realtime
Found file(lib): /usr/share/linuxcnc/hallib/core_sim.hal
Found file(lib): /usr/share/linuxcnc/hallib/sim_spindle_encoder.hal
Found file(lib): /usr/share/linuxcnc/hallib/axis_manualtoolchange.hal
Found file(lib): /usr/share/linuxcnc/hallib/simulated_home.hal
Found file(lib): /usr/share/linuxcnc/hallib/check_xyz_constraints.hal
Traceback (most recent call last):
  File "/usr/bin/axis", line 62, in <module>
    from rs274.OpenGLTk import *
  File "/usr/lib/python3/dist-packages/rs274/OpenGLTk.py", line 16, in <module>
    import _togl
ImportError: /usr/lib/python3/dist-packages/_togl.cpython-310-x86_64-linux-gnu.so: undefined symbol: glXDestroyCo
ntext
Shutting down and cleaning up LinuxCNC...
task: 634 cycles, min=0.000015, max=0.005903, avg=0.001098, 0 latency excursions (> 10x expected cycle time of 0.
001000s)
Note: Using POSIX non-realtime
LinuxCNC terminated with an error.  You can find more information in the log:
    /home/seb/linuxcnc_debug.txt
and
    /home/seb/linuxcnc_print.txt
as well as in the output of the shell command 'dmesg' and in the terminal
SebKuzminsky commented 1 year ago

A couple of thoughts here, from me who doesn't know the first thing about OpenGL:

  1. It looks like we're now using a mix of EGL and GLX, is that ok? Seems wrong.
  2. We have an old, old fork of the togl source in our repo, we should probably look into rewriting our _toglmodule & related build infrastructure to use debian's packaged libtogl and libtogl-dev instead.
swt2c commented 1 year ago

Sorry to butt in here, but since I'm here... :-)

A couple of thoughts here, from me who doesn't know the first thing about OpenGL:

1. It looks like we're now using a mix of EGL and GLX, is that ok?  Seems wrong.

Yes, that's probably not going to work. If you want to work natively on Wayland, you're going to have to use EGL. Otherwise, you could force things back to X11 and use GLX.

2. We have an old, _old_ fork of the togl source in our repo, we should probably look into rewriting our _toglmodule & related build infrastructure to use debian's packaged libtogl and libtogl-dev instead.

Assuming that _togl.cpython-310-x86_64-linux-gnu.so is your forked of togl, then yes it appears to be linked with GLX.

SebKuzminsky commented 1 year ago

Hi, nice to see you here! Thanks for the pyopengl fix, and for your advise on our OpenGL mess :-)

_togl.cpython-310-x86_64-linux-gnu.so is built from https://github.com/LinuxCNC/linuxcnc/blob/master/src/emc/usr_intf/axis/extensions/_toglmodule.c, which hilariously #includes our fork of togl.c...

swt2c commented 1 year ago

I'm sorry that I looked. ;)

Your togl code would probably need to grow EGL support, if you wanted to go that route. I looked quickly at Debian's togl and it doesn't look much better/newer. togl project seems to be dead as far as I can see.

SebKuzminsky commented 1 year ago

We don't have the expertise or volunteer-hours available to switch our whole world from GLX to EGL currently, so it looks like I should reopen #2267 and advocate for that as our workaround for the near-term future.

Does that sound like the least-worst solution to you, @swt2c?

petterreinholdtsen commented 1 year ago

[Sebastian Kuzminsky @.***> writes:

We don't have the expertise or volunteer-hours available to switch our whole world from GLX to EGL currently, so it looks like I should reopen #2267 and advocating that as the workaround for the near-term future.

Reading the story about the Firefox team transition to EGL make me suspect we might have to work on this soonish, <URL: https://mozillagfx.wordpress.com/2021/10/30/switching-the-linux-graphics-stack-from-glx-to-egl/ >.

-- Happy hacking Petter Reinholdtsen

SebKuzminsky commented 1 year ago

Here's a link-heavy overview of the OpenGL/GLX/EGL landscape on Unix, I found it useful: https://utcc.utoronto.ca/~cks/space/blog/linux/EGLAndGLXAndOpenGL?showcomments#comments

It sounds like we should immediately force LinuxCNC back to running on GLX (like we have been forever), instead of inconsistently try to run partially on GLX and partially on EGL (like we accidentally started doing back in November).

We should then hope that one of us has the spoons to clean up our OpenGL mess and switch us from GLX to EGL, since that seems to be the way the future is going.

And maybe at the same time switch from OpenGL to OpenGL ES, to run better on tiny ARM machines which sometimes don't implement OpenGL but do implement OpenGL ES.

swt2c commented 1 year ago

We don't have the expertise or volunteer-hours available to switch our whole world from GLX to EGL currently, so it looks like I should reopen #2267 and advocate for that as our workaround for the near-term future.

Does that sound like the least-worst solution to you, @swt2c?

Yes.

petterreinholdtsen commented 1 year ago

[Sebastian Kuzminsky]

We don't have the expertise or volunteer-hours available to switch our whole world from GLX to EGL currently, so it looks like I should reopen #2267 and advocate for that as our workaround for the near-term future.

If I got it right, there are several different OpenGL related problems with Bookworm at the moment. The Wayland GLX vs EGL issue (not reported anywhere), the GLES on Wayland issue (reported to Debian as <URL: https://bugs.debian.org/1029011 >), the GLUT/freeglut3 issue (reported as <URL: https://bugs.debian.org/1029936 > and perhaps <URL: https://bugs.debian.org/590452 >), the glBitmap issue (not reported anywhere).

Did I get it right? Are there other problems too? Can someone who are able to reproduce the various issues make sure they are reported to Debian and/or Upstream?

-- Happy hacking Petter Reinholdtsen

JetForMe commented 1 year ago

I just installed the Jan 23 Debian 12 "testing" build (pretty sure it was dated Jan 23), and am running into these same issues. I'm willing to try to help out with this, but first, how do I properly install pyopengl mentioned here?

swt2c commented 1 year ago

I just installed the Jan 23 Debian 12 "testing" build (pretty sure it was dated Jan 23), and am running into these same issues. I'm willing to try to help out with this, but first, how do I properly install pyopengl mentioned here?

Just update your Debian testing. That fix is now in testing.

JetForMe commented 1 year ago

Is that this?

$ dpkg -l | grep -i opengl
…
ii  python3-opengl                          3.1.6+dfsg-2                    all          Python bindings to OpenGL (Python 3)

I have no other updates available.

swt2c commented 1 year ago

On Mon, 30 Jan 2023, Rick M wrote:

Is that this?

$ dpkg -l | grep -i opengl … ii python3-opengl 3.1.6+dfsg-2 all Python bindings to OpenGL (Python 3)

Yes.

JetForMe commented 1 year ago

Thank you for the confirmation. I'm experiencing a strange behavior where I get the "Platform does not define a GLUT font retrieval function" when logged into the VM console (I'm doing this in a Parallels VM on my Mac), but not when I'm logged in via ssh -Y. The environment is slightly different (e.g. WAYLAND_DISPLAY= is set on the console but not ssh), but I don't know this stuff well enough to understand.

NTULINUX commented 1 year ago

PYOPENGL_PLATFORM=x11 linuxcnc export PYOPENGL_PLATFORM=x11 ; linuxcnc

Nothing is working here for me..

edit: Happens on both LXQt and XFCE.

edit2: Note: Using Gentoo here; no Wayland, only X server.

JetForMe commented 1 year ago

@NTULINUX Do you have the fix to python3-opengl? That fixed the error this issue is about. But to get it to work all the way I also had to set export GDK_BACKEND=x11.

SebKuzminsky commented 1 year ago

This issue is "fixed" for now, by a combination of python3-opengl 3.1.6+dfsg-2 and #2314.

Many thanks to @swt2c for the pyopengl fix and for lending his expertise here, and thanks to @jepler for the #2314 workaround!

Gird your loins, linuxcnc hackers: we have lots of OpenGL work to do in the near future...

andypugh commented 1 year ago

Loins girded, I am trying to clear my pressing projects to make room.

jepler commented 1 year ago

Thank you Andy!