Closed vtjnash closed 3 years ago
Is this a) reproducible and b) also happens in 1.6? I also have seen segfaults on 1.6-beta on an internal software of mine. A minimal reproducible example would be good to track this down.
Those are the tests run by PkgEval. Note that tests for nightly have been been failing forever since we switched to Github Actions: https://github.com/JuliaGraphics/Gtk.jl/actions?query=workflow%3ACI
Hm, yes. But nightly != 1.6 right?
Yes, but I was referring to point 1, tests have been consistently failing for months. Maybe looking at them might help. Julia v1.6 isn't tested at the moment in CI (but you just need to add one line to the matrix)
Hm, "Error: Could not find a Julia version that matches 1.6". Do you know what to enter there? 1.6-?
tests passing locally on Julia 1.6rc1
At the moment it's ^1.6.0-0
, it'll be 1.6.0
, 1.6
, or 1
when it'll be released
radio: Error During Test at /home/runner/work/Gtk.jl/Gtk.jl/test/gui.jl:340
170
Test threw exception
171
Expression: get_gtk_property(get_gtk_property(r, :active), :label, AbstractString) == choices[2]
172
no active elements in GtkRadioGroup
seems unrelated to the segfault though.
ok, its the same test failure as on nightly.
I do not get why this is failing on Julia 1.6 (CI) but not not on 1.3 (CI) and also not locally on 1.6.
Locally I get:
julia> get_gtk_property(get_gtk_property(r,:active),:label,AbstractString)
"choice two"
@giordano: Can you somehow identify if different Gtk binaries are used for different Julia versions?
also not locally on 1.6.
That's the fun part of running CI, isn't it? 😃
Note that you can SSH into the github actions runners for debugging with the step
- name: Setup tmate session
uses: mxschmitt/action-tmate@v3
with:
limit-access-to-actor: true
right before the run tests step (you can remove the last two lines if you don't want to use your ssh key to login). See https://github.com/marketplace/actions/debugging-with-tmate
TIL, thanks @giordano!
Ok, I have some findings and can even reproduce the issue locally. Here is my test program:
# file is called gui2.jl here locally
using Test, Gtk
using Gtk.ShortNames, Gtk.GConstants, Gtk.Graphics
numFailures = 0
for l=1:1000
choices = ["choice one", "choice two", "choice three", RadioButton("choice four"), Label("choice five")]
r = RadioButtonGroup(choices,2)
active = [get_gtk_property(b,:active,Bool) for b in r]
if sum(active) != 1
@show active
global numFailures += 1
end
end
@show numFailures
So the issue is with the calls where the vector of active elements is obtained. Sometimes (not always) I get the following run:
julia> include("gui2.jl")
(julia:6981): GLib-GObject-CRITICAL **: 20:43:05.590: g_object_ref_sink: assertion 'G_IS_OBJECT (object)' failed
(julia:6981): GLib-GObject-CRITICAL **: 20:43:05.596: g_object_get_property: assertion 'G_IS_OBJECT (object)' failed
(julia:6981): GLib-GObject-CRITICAL **: 20:43:05.596: g_object_ref_sink: assertion 'G_IS_OBJECT (object)' failed
(julia:6981): GLib-GObject-CRITICAL **: 20:43:05.596: g_object_get_property: assertion 'G_IS_OBJECT (object)' failed
(julia:6981): GLib-GObject-CRITICAL **: 20:43:05.596: g_object_ref_sink: assertion 'G_IS_OBJECT (object)' failed
(julia:6981): GLib-GObject-CRITICAL **: 20:43:05.596: g_object_get_property: assertion 'G_IS_OBJECT (object)' failed
active = Bool[0, 0, 0, 0, 0]
numFailures = 1
1
So it seems that something went wrong Gtk internally and then simply false
is returned even for the active element.
I do not get this error with Julia 1.5.
So these sort of spontaneous errors are not the one I am capable of tracking down since this certainly touches the core of the type conversion in done in the glib module. The interesting question is what has changed from Julia 1.5 to 1.6 that can lead to these non-deterministic sort of errors. Ping @vtjnash who is much better prepared to solve this and ping @KristofferC to make you aware of this regression.
I suspect it's an issue with custom widgets since RadioButtonGroup is one (see also #557). Maybe there's some issues here ?
@jonathanBieler, the RadioButtonGroup code doesn't use gobject_move_ref
: https://github.com/JuliaGraphics/Gtk.jl/blob/d743a8bb429d6b23d92be415d71b4b6f2fda72c5/src/buttons.jl#L64
If I change the rhs to gobject_move_ref(new(layout), layout)
, I get this:
(julia:85406): GLib-GObject-CRITICAL **: 07:54:00.060: g_object_set_qdata_full: assertion 'G_IS_OBJECT (object)' failed
(julia:85406): GLib-GObject-CRITICAL **: 07:54:00.060: g_object_ref_sink: assertion 'G_IS_OBJECT (object)' failed
(julia:85406): Gtk-CRITICAL **: 07:54:00.075: gtk_container_add: assertion 'GTK_IS_CONTAINER (container)' failed
(julia:85406): Gtk-CRITICAL **: 07:54:00.075: gtk_container_add: assertion 'GTK_IS_CONTAINER (container)' failed
(julia:85406): Gtk-CRITICAL **: 07:54:00.076: gtk_container_add: assertion 'GTK_IS_CONTAINER (container)' failed
(julia:85406): Gtk-CRITICAL **: 07:54:00.103: gtk_container_add: assertion 'GTK_IS_CONTAINER (container)' failed
(julia:85406): Gtk-CRITICAL **: 07:54:00.118: gtk_container_add: assertion 'GTK_IS_CONTAINER (container)' failed
all immediately upon calling RadioButtonGroup(choices,2)
.
EDIT: the assertion 'G_IS_OBJECT (object)' failed
are triggered at https://github.com/JuliaGraphics/Gtk.jl/blob/b61cb23845f5784ca6232b3f9afed40f1f91f065/src/GLib/gtype.jl#L494
and the assertion 'GTK_IS_CONTAINER (container)' failed
by each push!
: https://github.com/JuliaGraphics/Gtk.jl/blob/d743a8bb429d6b23d92be415d71b4b6f2fda72c5/src/buttons.jl#L74
OK, this diff gets us closer to the design suggested in https://juliagraphics.github.io/Gtk.jl/latest/manual/customWidgets/:
diff --git a/src/buttons.jl b/src/buttons.jl
index 4574ee9..e19b533 100644
--- a/src/buttons.jl
+++ b/src/buttons.jl
@@ -59,9 +59,9 @@ mutable struct GtkRadioButtonGroup <: GtkContainer # NOT a native @gtktype
# the behavior is specified as undefined if the first
# element is moved to a new group
# do not rely on the current behavior, since it may change
- handle::GtkContainer
+ handle::Ptr{GObject}
anchor::GtkRadioButton
- GtkRadioButtonGroup(layout::GtkContainer) = new(layout)
+ GtkRadioButtonGroup(layout::GtkContainer) = gobject_move_ref(new(unsafe_convert(Ptr{GObject}, layout)), layout)
end
const GtkRadioButtonGroupLeaf = GtkRadioButtonGroup
macro GtkRadioButtonGroup(args...)
@@ -88,7 +88,7 @@ function push!(grp::GtkRadioButtonGroup, e::GtkRadioButton)
else
grp.anchor = e
end
- push!(grp.handle, e)
+ container_add!(grp.handle, e)
grp
end
function push!(grp::GtkRadioButtonGroup, label, active::Union{Bool, Nothing} = nothing)
@@ -100,7 +100,7 @@ function push!(grp::GtkRadioButtonGroup, label, active::Union{Bool, Nothing} = n
if isa(active, Bool)
gtk_toggle_button_set_active(e, active::Bool)
end
- push!(grp.handle, e)
+ container_add!(grp.handle, e)
grp
end
function start_(grp::GtkRadioButtonGroup)
diff --git a/src/container.jl b/src/container.jl
index a07ee41..b6609ee 100644
--- a/src/container.jl
+++ b/src/container.jl
@@ -1,9 +1,11 @@
+container_add!(w, child) = ccall((:gtk_container_add, libgtk), Nothing, (Ptr{GObject}, Ptr{GObject},), w, child)
+container_remove!(w, child) = ccall((:gtk_container_remove, libgtk), Nothing, (Ptr{GObject}, Ptr{GObject},), w, child)
function push!(w::GtkContainer, child)
- ccall((:gtk_container_add, libgtk), Nothing, (Ptr{GObject}, Ptr{GObject},), w, child)
+ container_add!(w, child)
w
end
function delete!(w::GtkContainer, child::GtkWidget)
- ccall((:gtk_container_remove, libgtk), Nothing, (Ptr{GObject}, Ptr{GObject},), w, child)
+ container_remove!(w, child)
w
end
function empty!(w::GtkContainer)
Sometimes it succeeds, but I still get intermittent failures like
(julia:92982): GLib-GObject-CRITICAL **: 08:57:46.379: g_object_ref_sink: assertion 'G_IS_OBJECT (object)' failed
(julia:92982): GLib-GObject-CRITICAL **: 08:57:46.384: g_object_get_property: assertion 'G_IS_OBJECT (object)' failed
that seem to be reduced by adding various sleeps. What's the right way to do this?
I think you might need a GC.@preserve w
in places like https://github.com/JuliaGraphics/Gtk.jl/blob/d743a8bb429d6b23d92be415d71b4b6f2fda72c5/src/buttons.jl#L115
Ho ho! I hadn't noticed that, but that's a good catch.
I added it everywhere start_
gets used in buttons.jl
(there are other uses for other types but I didn't try it there). I still get intermittent errors like the above.
I think you're seriously on to something, though: if I wrap https://github.com/JuliaGraphics/Gtk.jl/blob/d743a8bb429d6b23d92be415d71b4b6f2fda72c5/test/gui.jl#L314-L342
in GC.enable(false); ... ; GC.enable(true)
it seems to pass every time.
There really does seem to be something different between 1.5 and 1.6. If you put
GC.gc(true); GC.gc(true); GC.gc(true)
right after the r = RadioButtonGroup(choices,2)
, then on 1.6 I get failures on alternating runs. (weird) It never seems to fail on 1.5.
As disclosure, I've also commented out most steps that run prior to the "radio" tests, and I've also removed the @testset
.
diff --git a/test/gui.jl b/test/gui.jl
index 024af93..e5a9b81 100755
--- a/test/gui.jl
+++ b/test/gui.jl
@@ -2,7 +2,7 @@
using Gtk.ShortNames, Gtk.GConstants, Gtk.Graphics
import Gtk.deleteat!, Gtk.libgtk_version, Gtk.GtkToolbarStyle, Gtk.GtkFileChooserAction, Gtk.GtkResponseType
-
+#=
## for FileFilter
# This is just for testing, and be careful of garbage collection while using this
struct GtkFileFilterInfo
@@ -309,8 +309,8 @@ set_gtk_property!(check,:label,"new label")
showall(w)
destroy(w)
end
-
-@testset "radio" begin
+=#
+# @testset "radio" begin
choices = ["choice one", "choice two", "choice three", RadioButton("choice four"), Label("choice five")]
w = Window("Radio")
f = Gtk.GtkBox(:v); push!(w,f)
@@ -325,7 +325,9 @@ showall(w)
destroy(w)
r = RadioButtonGroup(choices,2)
+GC.gc(true); GC.gc(true); GC.gc(true)
@test length(r) == 5
+GC.enable(false)
@test sum([get_gtk_property(b,:active,Bool) for b in r]) == 1
itms = Vector{Any}(undef,length(r))
for (i,e) in enumerate(r)
@@ -335,12 +337,17 @@ for (i,e) in enumerate(r)
e[1]
end
end
+GC.enable(true)
@test setdiff(choices, itms) == [choices[4],]
@test setdiff(itms, choices) == ["choice four",]
+GC.enable(false)
@test get_gtk_property(get_gtk_property(r,:active),:label,AbstractString) == choices[2]
+GC.enable(true)
w = Window(r,"RadioGroup")|>showall
destroy(w)
-end
+# end
+# GC.enable(true)
+error("stop")
@testset "ToggleButton" begin
tb = ToggleButton("Off")
Another observation: if I combine the above with disabling GC while the RadioButtonGroup
constructor runs, then I get the assertion 'G_IS_OBJECT (object)' failed
on every run. So it seems that GC needs to run during construction?
I am now onto other deadlines, but to leave this in a good state, here's a MWE:
using Revise # curiously, this seems to be important to making this fully reproducible, though CI fails without
using Gtk, Gtk.ShortNames, Test
choices = ["choice one", "choice two", "choice three", RadioButton("choice four"), Label("choice five")]
r = RadioButtonGroup(choices,2)
GC.gc(true); GC.gc(true); GC.gc(true)
@test length(r) == 5
@test sum([get_gtk_property(b,:active,Bool) for b in r]) == 1
This reliably passes on 1.5 and reliably fails on 1.6. If you comment out the GC.gc(true); GC.gc(true); GC.gc(true)
line then it seems to pass most or all the time on 1.6. The outcome is repeatable even in the same session, you don't need a restart.
Both are running Gtk 1.1.6.
I've opened an issue (see link above) that identifies https://github.com/JuliaLang/julia/pull/38180 as the probable source of the change. What's really odd is that Gtk doesn't use WeakKeyDict
(lots of WeakRef
s though), but there were also changes to gc.c
.
That changed WeakRef so they must be re-set by a finalizer if it wants to keep them active. E.g.
w = WeakRef(x)
finalizer(x) do x; keepalive(x) && w.value = x; end
Sorry, I don't understand. I don't find keepalive
in Base
, GC
, Gtk
, or Gtk.GLib
. Are there any docs on this?
I suppose I should have just said the_rest_of_the_finalizer(x)
OK. It would be ideal if someone who understands memory management in Gtk (@jonathanBieler ?) fixes this here, and @vtjnash adds the requisite NEWS entry (https://github.com/JuliaLang/julia/issues/39811#issuecomment-786599415). If @jonathanBieler can't fix it I can tackle it, but I'd like to wait for @vtjnash's NEWS entry first since the enhanced clarity will save me time (which is in short supply).
@timholy Sadly I don't understand much about memory management in Gtk.jl or Julia for that matter.
I don't see an obvious reason it should behave different. Someone might want to run this under rr
, so you watch where the g_object_unref
calls happen and see where it should be fixed.
Sorry @jonathanBieler, I had you mixed up with with @bfredl
who added much of the glib stuff but who hasn't been around a lot lately.
Unfortunately, every time I've tried to use rr
in the last two years I get segfaults that don't happen without it:
$ rr record ./julia-debug --args /tmp/gtk.jl
rr: Saving execution to trace directory `/home/tim/.local/share/rr/julia-debug-0'.
Segmentation fault
( I know this does not change anything, but a big "thank you" for investigating Tim )
https://s3.amazonaws.com/julialang-reports/nanosoldier/pkgeval/by_date/2021-02/11/GtkReactive.1.7.0-DEV-50742c7e42.log