crosswire / xiphos

Xiphos is a Bible study tool written for Linux, UNIX, and Windows using GTK, offering a rich and featureful environment for reading, study, and research using modules from The SWORD Project and elsewhere.
http://xiphos.org
GNU General Public License v2.0
207 stars 51 forks source link

No text display #1125

Closed BeForgiven-Info closed 11 months ago

BeForgiven-Info commented 1 year ago

Some update around 5/4/23 made xiphos stop working. Using Debian 11, XFCE 4.16, but LXDE has the same problem. On another computer using the default Debian graphic code, xiphos works. But on a computer using NVIDIA drivers (for good and necessary reasons) Xiphos comes up but none of the Bible text shows. All updates are complete, tried rebooting, no search helps. Other Sword front ends work (Bible Desktop and Ezra). If I launch Xiphos from command line, I see the following error that does not show on the other computer: (xiphos:3049): Gtk-CRITICAL **: 05:07:01.746: gtk_box_pack: assertion '_gtk_widget_get_parent (child) == NULL' failed

Any help would be much appreciated.

And, by the way, is Xiphos still being maintained?

scottonanski commented 1 year ago

The error message you're seeing typically happens when you're trying to add a widget to a container and that widget is already a child of another container in a GTK+ application. In GTK+, a widget can only have one parent container at a time. If you try to add a widget to a container while it's already in another container, you'll see the error message you posted.

Here are a few ways you might resolve this issue:

Remove the widget from the current parent before adding it to another container. This can be done using the gtk_container_remove function.

Here's an example of how you can do it in C:

GtkContainer *old_parent = GTK_CONTAINER(gtk_widget_get_parent(widget)); gtk_container_remove(old_parent, widget);

And then you can add the widget to another container.

Create a new instance of the widget. This may be more applicable in situations where you don't have control over the previous parent, or where you want to maintain the widget in its current parent as well.

Here is an example of creating a new button and adding it to a container:

GtkWidget *button = gtk_button_new_with_label("Button"); gtk_container_add(GTK_CONTAINER(container), button);

Without more specific code context, it's hard to give a precise solution, but these general pointers should help you figure out what's going wrong.

oscen0 commented 11 months ago

Some update around 5/4/23 made xiphos stop working. Using Debian 11, XFCE 4.16, but LXDE has the same problem. On another computer using the default Debian graphic code, xiphos works. But on a computer using NVIDIA drivers (for good and necessary reasons) Xiphos comes up but none of the Bible text shows. All updates are complete, tried rebooting, no search helps. Other Sword front ends work (Bible Desktop and Ezra). If I launch Xiphos from command line, I see the following error that does not show on the other computer: (xiphos:3049): Gtk-CRITICAL **: 05:07:01.746: gtk_box_pack: assertion '_gtk_widget_get_parent (child) == NULL' failed

I'm having the same problem around the time xfce4 was updated to 4.18 in Fedora 38 x86_64 (latest one is xfce4 4.18.3 updated on 6/16/2023). I also use Nvidia proprietary video driver 535.113.01, and xiphos 4.2.1 from Fedora main repository which is dated as built on 1/20/2023.

In my case, I don't see any error message when starting xiphos from command-line.

One thing I notice is that when I press down the mouse button in the blank text area and slide the mouse, I can see the text of the verse chosen with all correct colors, moving with the cursor, but as soon as I release the button, the text disappears.

I tried deleting ~/.xiphos and ~/.local/share/xiphos and start from scratch, but doesn't help.

karlkleinpaste commented 11 months ago

I was running F37 until 3 weeks ago, where Xiphos had no problems.

Then I updated to F39beta (I don't usually install betas, but timing necessitated it this time), where I find Xiphos does not display properly. I am on nvidia-enabled machines exclusively, though it's not clear nvidia is the source of the problem. I made a clean install of minimal F39beta in a VM, where Xiphos runs fine. I do not yet know what the difference is in that minimal environment (barely more than base install + Sword + Xiphos + BibleSync) vs. my regular, heavyweight, many-packages-installed environment.

Just now, I have discovered that if I run Xiphos inside an XPRA+VGL session, it displays correctly again. This is peculiar from the nvidia point of view, because VGL explicitly links nvidia libraries in the mix, but does so through its "faker" libraries.

I do not yet know what to make of any of this. I will need to do some comparison of Xiphos' linked libraries during execution in all of [a] my regular heavyweight world, [b] minimal install world, and [c] heavyweight-but-under-VGL world.

karlkleinpaste commented 11 months ago

XPRA's involvement is not relevant. Simply running vglrun xiphos is sufficient for correct display.

This is darn peculiar, and one heck of a workaround.

karlkleinpaste commented 11 months ago

I should explain...

VGL is VirtualGL. VGL is an interception/redirection facility, by which to engage local-to-the-app GPU support in (what are typically) remotely-viewed applications. That is, when not operating a display on the same machine as where computation is taking place, VGL exists so that complicated, data-heavy rendering requests are not sent over the wire to a remote, network-slow, and possibly stupid X server. This applies to both remote desktop facilities like VNC and XPRA as well as ssh sessions with X forwarding.

VGL uses LD_PRELOAD to insert its faker libraries, libdlfaker.so and libvglfaker.so, intercepting linkage to GL, xcb, and some other X calls, shipping them instead to the local GPU, either by its old pre-3.x method of borrowing an existing X session (VGL_DISPLAY=:0) or by its 3.x-and-later method of speaking directly to the GPU (VGL_DISPLAY=egl). Then only completed frames are sent to the actual X server, which uses vastly less bandwidth and expects no consequential rendering at that X server.

Now... As to why this would have any effect at all on the ability of Xiphos to display simple text widgets... I am completely mystified at this time.

karlkleinpaste commented 11 months ago

@ArrayBolt3 might you have any input on this? Since you've recently expressed interest, I wonder if we have a misinteraction in choice of libs to be linked...except that we're clearly not in complete control of those choices, considering that both minimal and VGL-assisted environments make it work without having any special new control.

I love puzzles, I hate mysteries.

karlkleinpaste commented 11 months ago

OK... VGL or XPRA is sufficient to make it display OK, including both together, of course. So there's something in the raw, unaltered DISPLAY=:0 case that's off.

BeForgiven-Info commented 11 months ago

I cannot say what changed that all of a sudden when it quit working, but it appears that some Linux (Debian Bullseye/Bookworm) change/update caused it. I was running Bullseye at the time, and Bookworm did not fix it when I upgraded. Using Nvidia drivers (no choice). The problem does not occur with the default noveau driver on a Buster/Bookworm VM, nor on a Buster/Bookworm nearly identical machine using noveau. I would gladly switch drivers on the problem machine, but cannot. Stuck with nvidia. I do not know whether the change was in something concerning nvidia, X, or something else in Debian.

Tom Sullivan @.*** FAX: 815-301-2835

On 10/17/23 11:30, karl kleinpaste wrote:

I was running F37 until 3 weeks ago, where Xiphos had no problems.

Then I updated to F39beta (I don't usually install betas, but timing necessitated it this time), where I find Xiphos does not display properly. I am on nvidia-enabled machines exclusively, though it's not clear nvidia is the source of the problem. I made a clean install of minimal F39beta in a VM, where Xiphos runs fine. I do not yet know what the difference is in that minimal environment (barely more than base install + Sword + Xiphos + BibleSync) vs. my regular, heavyweight, many-packages-installed environment.

Just now, I have discovered that if I run Xiphos inside an XPRA+VGL session, it displays correctly again. This is peculiar from the nvidia point of view, because VGL explicitly links nvidia libraries in the mix, but does so through its "faker" libraries.

I do not yet know what to make of any of this. I will need to do some comparison of Xiphos' linked libraries during execution in all of [a] my regular heavyweight world, [b] minimal install world, and [c] heavyweight-but-under-VGL world.

— Reply to this email directly, view it on GitHub https://github.com/crosswire/xiphos/issues/1125#issuecomment-1766661767, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIEUKVXBC3UBF4BBEAEUL53X72QCHAVCNFSM6AAAAAAXW7TO7SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONRWGY3DCNZWG4. You are receiving this because you authored the thread.Message ID: @.***>


This email has been scanned by the Symantec Email Security.cloud service. For more information please visit http://www.symanteccloud.com


ArrayBolt3 commented 11 months ago

@karlkleinpaste Don't have much insight currently, but knowing that it happens on Debian Bookworm XFCE is very useful. I'll dig around when I get the time and see what might be happening. (I have to finish a package review in Fedora first and have some other business to attend before I get there.)

karlkleinpaste commented 11 months ago

So it's an nvidia problem, one way or another -- a hardware acceleration failure. I am so confused.

I got lists of linked library names from running Xiphos processes out of /proc/XXX/maps, for plain DISPLAY=:0 on my screen, for VGL, and for XPRA. I diff'd them to see which instance had gained/lost which libraries.

The following 4 libs figured prominently in being present on the display but missing from the VGL and XPRA runs. Moving any 1 of these out of its normal spot in /usr/lib64 makes Xiphos display successfully on my machine, while giving me a whine on stderr about acceleration failure.

libGLX_nvidia libnvidia-tls libnvidia-glcore libnvidia-allocator

The first is especially obvious; the others, less so to me, but I can't say I'm surprised, either. The whine is:

** (xiphos:422014): WARNING **: 16:27:05.406: Disabled hardware acceleration because GTK failed to initialize GL: No available configurations for the given RGBA pixel format.

Significantly, perhaps, when run under VGL, I get the same diagnostic and a functioning display.

I'm reasonably certain that Xiphos never makes any particular demands for pixel formats in the first place.

When run under XPRA, I get a functioning display but a different daignostic:

libEGL warning: DRI2: failed to authenticate

Now, the whole point of VGL in particular is to ensure that all needed libs get handled for acceleration purposes. And for example I can vglrun glxgears and get a hyper-fast gears display, so VGL works, fundamentally speaking.

But then there are 2 issues at hand:

The observation "you can have functioning text widgets iff you don't manage to enable acceleration" makes no sense to me.

For the time being, yes, it's an awful workaround, but use of VGL makes Xiphos display OK.

ArrayBolt3 commented 11 months ago

@karlkleinpaste I bet we're running into a variant of this mess: https://bugs.webkit.org/show_bug.cgi?id=228268 If you do something like enable Night Mode or something similar, can you barely make out the text?

ArrayBolt3 commented 11 months ago

Also try setting WEBKIT_DISABLE_DMABUF_RENDERER=1 when running Xiphos, see if that makes any difference.

karlkleinpaste commented 11 months ago

That's a remarkable find. Thank you.

Yup, that's it. Not Xiphos' fault at all. [heavy sigh of relief]

Either WEBKIT_DISABLE_DMABUF_RENDERER=1 or WEBKIT_DISABLE_COMPOSITING_MODE=1 (see 4th comment from the end of that report) makes Xiphos display normally without announcing an acceleration failure.

My eyes glazed over pretty hard, reading that bug report. Do I understand correctly that this is a regression in GTK that is tickled by an nvidia driver (or lib) bug?

ArrayBolt3 commented 11 months ago

I believe it's a regression in Webkit (which is the engine Xiphos uses to render text). It looks like they rewrote the rendering engine and now it doesn't like NVIDIA anymore. Why exactly is a mystery, but falling back to the old renderer (or as you found, disabling the "compositing mode" (whatever that is)) works around the issue.

Also the actual bug ended up being https://bugs.webkit.org/show_bug.cgi?id=261874 I believe, the first one I pasted was similar but not the same AFAICT.

karlkleinpaste commented 11 months ago

Excuse me, yes, WebKit is what I meant.

In any event, this is resolved as unfixable by us, being Somebody Else's Problem. Closing. But we need to keep this in mind, because it's a certainty that we will hear about it again.

ArrayBolt3 commented 11 months ago

I mean, technically yes it's not our problem, but perhaps it would be nice to just add a setenv() call (assuming that will do the trick) that sets the needed environment var? One assumes that once WebKit gets rid of the older renderer, the env var will become a no-op and we can drop it at some point after that. Then even if everyone else's WebKit apps are rendering weird, ours won't at least.

edit: I'm realizing this is probably silly - the WebKit people will probably fix the bug and it's rather recent, so it's probably something we can and should ignore (or possibly document).

karlkleinpaste commented 11 months ago

It's a good point. If there is no fix forthcoming after a little while, we'll do that, add a setenv(). As you say, it's a current bug, being addressed in comments literally today. If it doesn't get an update "soon" (I don't want to try to quantify that now), then we'll take care of it ourselves.