Closed mgorny closed 1 year ago
Yes, I see it on a regular basis on the GitHub CI. But not on my computer. If you can find something to help improving the tests suit robustness, I would be happy to merge a patch, or give a hand :)
I'm rebuilding Python with debug symbols. What really doesn't seem to make sense is that display
is <mss.linux.LP_Display object at 0x7fa3d81daf90>
, so it's not None
.
I guess it is a underlying object of display
that may cause an issue 🤔
Do you reproduce if you add a sleep(1)
after Xvfb start?
Do you reproduce if you add a
sleep(1)
after Xvfb start?
I have the same Xvfb started two hours ago ;-).
Hmm, i suspect POINTER()
may hide an actual null pointer inside. Trying to figure out how to get its address/value.
Ah, it has a boolean value. So if I add assert self._handles.display
just below XOpenDisplay()
, I sometimes get a failed assertion — this is sufficient to prevent segfaults but if I understand your comment from the other bug correctly, shouldn't failing self.xlib.XOpenDisplay()
trigger an exception then?
I think the null pointer actually breaks Python before the error handler has a chance to do something. It may be a incorrect guess though.
Because the error handler will be called by ctypes
after the X function returns something. Here, the null pointer in POINTER()
crashes the X function call.
Actually, I think error handlers may not be used by XOpenDisplay
at all — the docs are so bad :-(.
In any case, what do you think about adding:
if not self._handles.display:
raise ScreenShotError(f"Unable to open display: {display!r}.")
While it doesn't solve the underlying issue, an explicit exception is still better than segv. I can submit a PR if you wish.
As stated in https://www.x.org/releases/X11R7.7/doc/man/man3/XOpenDisplay.3.xhtml:
If XOpenDisplay does not succeed, it returns NULL.
So yes, let's go with your fix, that seems like the way to go 👍🏻
Having a way to debug why it failed would be cool, but I don't see how we could get more details.
Oh, I now see that the test suite is using xvfbwrapper — at least partially, that is. I was confused because a lot of tests fail without DISPLAY
being set. I'm guessing the problem is indeed that xvfbwrapper.start()
returns too soon, and so adding a little delay should reduce the risk of failures.
That said, I'm a bit confused by this — apparently we need a display to run the test suite at all, so why some of the tests use an additional layer of Xvfb? Perhaps the simplest solution would be to just use the outer $DISPLAY
there instead of starting Xvfb. That should avoid the problem since the server is started earlier, and would avoid adding delays to all tests.
That said, the bug probably lies in xvfbwrapper itself. FWICS it's using a pretty "obsolete" method of starting Xvfb, compared e.g. to xvfb-run. Unfortunately, the package seems to be dead — there is an open PR actually fixing race conditions that received no reply.
So, what do you think about removing xvfbwrapper
use and using the Xvfb instance started as part of GHA? I could also try to switch GHA to use xvfb-run
to be even more reliable.
Hmm, now I see that xvfbwrapper is used to start the display at specific screen size.
In that case, perhaps PyVirtualDisplay could be a better alternative? In any case, the code there seems more robust, the package has had recent commits and it's used by pytest-xvfb, so I suppose it has some popularity.
Hmm, now I see that xvfbwrapper is used to start the display at specific screen size.
In that case, perhaps PyVirtualDisplay could be a better alternative? In any case, the code there seems more robust, the package has had recent commits and it's used by pytest-xvfb, so I suppose it has some popularity.
Good idea 👍🏻
Should I make a PR to switch or do you want to do it?
Should I make a PR to switch or do you want to do it?
If you have time to do it, let's go :)
Damn, it seems that pyvirtualdisplay also has race conditions :-(. I suppose I'll try to fix it first then.
If the check you added in the init is enough, then lets keep it as-is. It works pretty well :)
If the check you added in the init is enough, then lets keep it as-is. It works pretty well :)
Unfortunately, the check only replaces segv with some test failures. Besides, given that xvfbwrapper is unmaintained, I'd like to be able to remove it from Gentoo sooner than later (i.e. before it causes even more issues). I've created #249 to use PyVirtualDisplay — it also makes the code a bit shorter ;-).
Good argument about the removal from Gentoo, you should have started with that ;)
General information:
For GNU/Linux users:
Description of the warning/error
When running the test suite under Xvfb, various tests sometimes cause it to segfault seemingly randomly, e.g.:
gdb suggests that
XQueryExtension()
is receiving a NULL pointer as display:This seems to be some ugly race condition, I'm trying to investigate further.
Other details
I'm testing via: