GLibSharp / GtkSharp

.Net bindings for gtk+3 and above
Other
27 stars 21 forks source link

Fix segfault in GException finalizer #61

Closed ylatuya closed 2 years ago

ylatuya commented 2 years ago

Without the fix the following segfault can be reproduced in macOS

=================================================================
    Managed Stacktrace:
=================================================================
      at <unknown> <0xffffffff>
      at GLib.GException:g_clear_error <0x000a0>
      at GLib.GException:Finalize <0x00052>
      at System.Object:runtime_invoke_virtual_void__this__ <0x000ab>
=================================================================
The application was terminated by a signal: SIGABRT
thiblahute commented 2 years ago

Not sure to understand, g_clear_error needs a pointer to pointer, you patch is making us give it a simple pointer to me, what am I missing?

ylatuya commented 2 years ago

I overlooked the g_clear_error API, it requires indeed to pointer to a pointer. After changing this I didn't see the error anymore so I though it was a simple binding issue, although it seemed strange because it has been like this for ages. The error I am seeing is racy, I will investigate more to see the root cause.

ylatuya commented 2 years ago

Ok, after investigating a bit more, I can see that when everything works errptr is IntPtr.Zero in the Finalizer, which means someone else called g_clear_error. GException should probably keep a copy of the Error itself to keep it alive while the managed object is still alive. What do you think?

ylatuya commented 2 years ago

For more context, I reproduce this issue using the Gst.Discoverer API on a .pdf file. It posts an error message for a missing plugin, which is converted into a GException. I my test, I than dispose the Discoverer instance and when the GException finalizer is triggered it randomly ends with a segfault. It seems a race disposing Gst.Discoverer and the GException finalizer that would be prevented with a copy of the error. I am running the tests with the debugger to check if someone else is clearing the error. From the C# bindings perspective, GException should probably keep a copy of the GError since anyone could clear it at any point.

ylatuya commented 2 years ago

I believe my suspicions were correct: GstDiscover is clearing the error here I will change the patch to create a copy of the GError.

thiblahute commented 2 years ago

I believe my suspicions were correct: GstDiscover is clearing the error here I will change the patch to create a copy of the GError.

I guess it is the right approach in that particular case as:

 * @error: (allow-none) (type GLib.Error): #GError, which will be non-NULL
 *                                         if an error occurred during
 *                                         discovery. You must not free
 *                                         this #GError, it will be freed by
 *                                         the discoverer.

But it should depend on the context and how ownership is supposed to be handled in each case. I guess what is required is to copy the GError on signal when the parameter type has been marked with G_SIGNAL_TYPE_STATIC_SCOPE.

ylatuya commented 2 years ago

Excellent point, I didn't know there was such annotations to indicate whether the error should be freed or not.

Looking at the class Value implementation, it seems that when marshaling a value to a managed type, it always tries to set owned=false. For example ToBoxed tries to find a constructor with the signature (IntPtr, bool) and it will use it if available.

        object ToBoxed ()
        {
            IntPtr boxed_ptr = g_value_get_boxed (ref this);

            if (boxed_ptr == IntPtr.Zero)
                return null;

            Type t = GType.LookupType (type);
            if (t == null)
                throw new Exception ("Unknown type " + new GType (type).ToString ());
            else if (t.IsSubclassOf (typeof (GLib.Opaque)))
                return (GLib.Opaque) this;

            MethodInfo mi = t.GetMethod ("New", BindingFlags.Static | BindingFlags.Public | BindingFlags.FlattenHierarchy, null, new Type[] { typeof(IntPtr) }, null);
            if (mi != null)
                return mi.Invoke (null, new object[] {boxed_ptr});

            ConstructorInfo ci = t.GetConstructor (new Type[] { typeof(IntPtr), typeof (bool) });
            if (ci != null)
                return ci.Invoke (new object[] { boxed_ptr, false });

            ci = t.GetConstructor (new Type[] { typeof(IntPtr) });
            if (ci != null)
                return ci.Invoke (new object[] { boxed_ptr });

            return Marshal.PtrToStructure (boxed_ptr, t);
        }

I believed a proper fix could be adding a new owned parameter to the constructor for GException and clear the error only when owned=true, similar to how Opaque works

thiblahute commented 2 years ago

I believed a proper fix could be adding a new owned parameter to the constructor for GException and clear the error only when owned=true, similar to how Opaque works

Well, if we don't own it we have no control over its lifetime so if we just do that we might be using a pointer to a GError that was already destroyed I am afraid.

ylatuya commented 2 years ago

I believed a proper fix could be adding a new owned parameter to the constructor for GException and clear the error only when owned=true, similar to how Opaque works

Well, if we don't own it we have no control over its lifetime so if we just do that we might be using a pointer to a GError that was already destroyed I am afraid.

Since GError's have no refcounting to keep them alive, GException must make a copy of the GError if owned=false, since it's the only way to ensure that we won't be using a null pointer at any time.

ylatuya commented 2 years ago

PR updated using the owned parameter to determine of the error needs to be freed on object's finalizer.

ylatuya commented 2 years ago

@sdroege Can you please review this?

sdroege commented 2 years ago

Since GError's have no refcounting to keep them alive, GException must make a copy of the GError if owned=false, since it's the only way to ensure that we won't be using a null pointer at any time.

So shouldn't you take a copy here then? If owned=false you would now keep the original pointer around but there's nothing that would ensure that this original pointer's lifetime is at least as long as the GException wrapper object.

ylatuya commented 2 years ago

@sdroege I have changed the way GException are handled by copying required fields in the constructor and releasing the GError pointer only if needed. This does not require to have the GError alive anymore.