Papierkorb / qt5.cr

Qt5 bindings for Crystal, based on Bindgen
Mozilla Public License 2.0
212 stars 20 forks source link

GC frees objects and signal handlers #49

Open f-fr opened 3 years ago

f-fr commented 3 years ago

Consider the following simple example, a window with a single button whose clicked signal triggers the GC (in this example explicitly for illustration, but it may run automatically as well and cause the same problem).

require "qt5"

class MyWidget < Qt::Widget
  def initialize
    super

    # Variant 1: no instance variables *CRASHES*
    button = Qt::PushButton.new("Press Me")
    button.on_clicked do
      puts "Click"
      a = [0] * 10000000 # just to trigger the GC sooner
      GC.collect
    end

    layout = Qt::VBoxLayout.new
    layout.add_widget(button)
    self.layout = layout

    resize(400, 300)
  end
end

qApp = Qt::Application.new
win = MyWidget.new
win.show
Qt::Application.exec

The program crashes after clicking a few times on the button (3 times in may case). The reason seems to be (if I understand everything correctly) that the GC frees the objects instantiated in the constructor, namely the button, the layout AND the call (i.e. the proc bound to on_clicked). Given that Qt is a C++ library with different memory management strategies, this may not be that surprising (although certainly unexpected for a Crystal-dev without knowledge of the C++-Qt-stuff).

Anyway, a simple workaround would be to keep a reference to the objects in instance variables (just the code of initialize):

    # Variant 2: button and layout in instance variables *CRASHES*
    @button = Qt::PushButton.new("Press Me")
    @button.on_clicked do
      puts "Click"
      a = [0] * 10000000 # just to trigger the GC sooner
      GC.collect
    end

    @layout = Qt::VBoxLayout.new
    @layout.add_widget(@button)
    self.layout = @layout

Unfortunately, this also crashes. It took me some time to find out (because it was not obvious to me): connecting the callback to the clicked signal via the on_clicked method creates an internal object (somewhere in the bindgen generated binding code) and no reference to that object seems to be stored somewhere. The problem is that there does not seem to be a way the get this reference. Therefore I copied the binding code directly:

    # Variant 3: button, layout and callback in instance variable *WORKS*
    @button = Qt::PushButton.new("Press Me")
    @cb = Proc(Bool, Void).new do
      puts "Click"
      a = [0] * 10000000 # just to trigger the GC sooner
      GC.collect
    end
    Qt::Binding.bg_QAbstractButton_CONNECT_clicked_CrystalProc_void__bool_(@button, Qt::BindgenHelper.wrap_proc(@cb))

    @layout = Qt::VBoxLayout.new
    @layout.add_widget(@button)
    self.layout = @layout

    resize(400, 300)

Just to make sure that it is not sufficient to store only the callback, the following variant (which stores the callback but neither the button nor the layout) also crashed (after a few more clicks, 10 or so):

    # Variant 4: callback in instance variable *CRASHES* (after 10 clicks or so)
    button = Qt::PushButton.new("Press Me")
    cb = Proc(Bool, Void).new do
      puts "Click"
      a = [0] * 10000000 # just to trigger the GC sooner
      GC.collect
    end
    Qt::Binding.bg_QAbstractButton_CONNECT_clicked_CrystalProc_void__bool_(button, Qt::BindgenHelper.wrap_proc(cb))

    layout = Qt::VBoxLayout.new
    layout.add_widget(button)
    self.layout = layout

So may main questions is: is there another proper way to work with signals? Did I do something totally wrong? I could not find anything about memory management in docs (but I may have overlooked) ...

I've used the current master branch and Qt 5.15 on linux.

docelic commented 3 years ago

Heya, thanks for the report. It looks like something @Papierkorb would be most qualified to comment on. Stefan do you have some feedback? Thanks!

f-fr commented 3 years ago

I played a little bit with the code. For some reason it is okay to keep only the layout in an instance variable. The button does not have to be. Consider the following example (without the callback, which is a different story):

# Variant 5: no callback, no instance variables *CRASHES*
# Don't click the button but somewhere else in widget
class MyWidget < Qt::Widget
  def initialize
    super

    button = Qt::PushButton.new("Press Me")
    layout = Qt::VBoxLayout.new
    layout.add_widget(button)
    self.layout = layout

    resize(400, 300)
  end

  def mouse_release_event(ev)
    puts "Click"
    GC.collect
  end
end

This crashed. If we keep the layout in an instance variable, everything works:

# Variant 6: no callback, layout in instance variables *WORKS*
# Don't click the button but somewhere else in widget
class MyWidget < Qt::Widget
  def initialize
    super

    button = Qt::PushButton.new("Press Me")
    @layout = Qt::VBoxLayout.new
    @layout.add_widget(button)
    self.layout = @layout

    resize(400, 300)
  end

  def mouse_release_event(ev)
    puts "Click"
    GC.collect
  end
end

And to make everything even more confusing: the described problems appear when compiling the test program in "debug" mode. However, variants 4 and 5 do work (without crash) when compiled in "release" mode (i.e. keeping the button and/or the layout in instance variables is no longer necessary, the issue with the callback persists, however).

Papierkorb commented 3 years ago

Hello. I haven't used Crystal in three years at this point, and considering the effort it'd take me to fix this .. I don't think I'm willing to dedicate so much time anymore to FOSS stuff now or for years to come. Sorry, but I'm out...

Though @f-fr is on the right track I think. The pointers/objects passed to Qt seem to vanish from the GC, which'll then do its job and destroy them at some point. Keeping them around in Crystal-land thus works around this issue. Iirc, there's something similar with Qt signals, but honestly I don't remember anymore under which circumstances it's triggered.

docelic commented 3 years ago

Sure Stefan, no worries. I mean even just your comments/thoughts are valuable & appreciated by everyone. Thanks!

docelic commented 3 years ago

@f-fr On a side note, did you have to make any other changes in Bindgen or Qt5 bindings to get it to compile & run the test program? If you did, even if they are not polished it would be great to submit them as PRs. Thanks!

f-fr commented 3 years ago

The only thing I had to do is the hint from #48 (removing that one line).

f-fr commented 3 years ago

Unfortunately the PR #53 does not solve this issue. If I understand it correctly, there is another problem: Qt objects that are not created by qt5.cr, i.e. everything allocated somewhere within Qt, typically lives in memory that is not traced by the GC. Hence, if we create some object from crystal, say a button, and add it to some widget, the widget will contain a pointer to that button. But because the widget's data is not traced by the GC, this pointer will not be seen by the GC. If we do not keep the button in an instance variable in Crystal, there will not be a GC-visible reference to the button and it will be deleted.

I have no idea if there is a way to make Qt-memory traced by the GC. Otherwise one would need a more complex wrapper object ...

f-fr commented 3 years ago

I worked a little on the memory management issue. You can see the efforts in this branch of bindgen and this branch of qt5.cr.

The bindgen commit basically changes every return type and argument type of the generated glue functions from T to typename bg_type<T>::type. By default this is just T, so absolutely nothing changes. But this type wrapper gives the opportunity to hook into the generated code.

This is then used in the qt5.cr code. The basic idea is as follows: for each class that is part of Qt's parent based memory management (i.e. basically everything inheriting from QObject, but there also also a few others like QListWidgetItem) we add another wrapper around the pointer to an instantiated object. The problem with the GC and the approach suggested in PR #53 is that Qt objects will then be managed by the GC, but their memory is still not traced. Now only the new wrapper class is GC managed, the Qt object is completely unmanaged/traced by the GC. In its destructor the wrapper also deletes the associated Qt object it points to unless that object has a parent. In this case the GC will only remove the wrapper but the Qt object will live (und eventually be destroyed if its parent is deleted). Furthermore, because the Qt object is kept in a QPointer, the wrapper will detect if the object it manages is freed (by Qt through the parent-stuff) but the wrapper still lives (because it has a reference somewhere in Crystal world, for instance). This should (hopefully) abort the program with a proper error message. The whole stuff is implemented with some C++ template voodoo around the aforementioned bg_type wrapper.

Everything seems to work quite fine so far. However, there are few rough edges:

  1. some classes (mostly private Qt classes not belonging to the public API) need to be removed from the memory-voodoo. This is done using the BG_NO_WRAP macro in binding_helpers.hpp. It would probably be better if methods that have some argument of some of these private types are completely ignored by bindgen's code generator (they cannot be used from Crystal anyway because, well, they are private, internal classes in the end). However I do not know how to prevent bindgen from generating glue code for function not only based on their name (this can be done with ignore_methods in the configuration files) but also on the type of the arguments.
  2. Some classes (like QListWidgetItem) have their own "hierarchy" ... it is not a subclass of QObject but is managed by the associated QListWidget it is part of. Currently there is special code for this case in binding_helpers.hpp but maybe this should be more genericly configurable.
  3. It is possible that two different wrappers to the same Qt object exist, e.g. by creating a Qt::ListWidgetItem and later getting a reference to the same item via Qt::ListWidget#item(...). This is not a problem if the item has a parent. However, if the item does not have a parent and one of the two wrapper objects is GC-collected, the Qt object will also be collected. This could be prevented by keeping track of existing wrappers, but this would probably increase the wrapper code significantly (but I'm not sure). Maybe it is sufficient to document this behaviour because it won't happen too often in real practice (I hope). For instance, in the Qt::ListWidget example each item will usually be assigned to some list and so the parent based memory management has priority anywar.
  4. There is the special QCoreApplication class (or one if its subclasses). These classes do not have a parent (although they are QObject subclasses) because they typically are the "root" of the object tree. The problem is as follows. A typical "main code" for a Qt-app is
    app = Qt::Application.new
    win = MainWindow.new
    win.show
    Qt::Application.exec

    Now if compiled with "--release" then Crystal is too clever and deduces that "app" is not used anymore after the initialization and immediately throws it away. Qt will then complain (when creating the MainWindow) that no application object has been created. Of course, Qt would internally keep a global reference to this application object (accessible through Qt::Application::instance, but this is not visible to the GC.

A simple workaround is to access the application object after the exec line, e.g.

app = Qt::Application.new
win = MainWindow.new
win.show
Qt::Application.exec
app.style_sheet # useless but prevents the GC from collecting the application object

In C++ one would keep the application object on the stack or behind a QScopedPointer or so. In Crystal one probably needs different approach (but I have no idea right now, maybe some funny "block" syntax Qt::Application.run do |app| ... end or so).

The code works in my tests but it is largely untested. I would really appreciate any comments or improvements.

docelic commented 3 years ago

In the past @kalinon and @ZaWertun had good contributions. Maybe they have some insight.

docelic commented 3 years ago

Or maybe @HertzDevil would have a comment, this could be interesting enough to summon his wisdom :)

f-fr commented 3 years ago

As another step: I added another commit to bindgen (see this branch). This one modifies the generated code for binding to Qt signals. The Crystal proc data is moved to GC traced memory and then passed to the callback lambda bound to the Qt signal. That way the Crystal proc will not be removed by the GC as long as the lambda exists (i.e. as long as it is bound to signal). This should solve the GC issue with signals. The downside is that it requires C++14 (it is certainly possible with C++11 too, but probably a little bit more complicate and requires more changes to bindgen, and maybe it's not worth the effort since C++14 should be pretty standard nowadays).

If one wants to test all of these GC-related changes, just use the integrate branch of my qt5.cr fork.

kalinon commented 3 years ago

I ran into this same issue, Ill try to give this issue a review this week. Nice job!

kalinon commented 3 years ago

So im currently running into this problem using your branch. going to continue to look into it, but looks like the generator re-defining Qt::Application

Called macro defined in lib/qt5/src/qt5/binding.cr:6:1

 6 | {% begin %}

Which expanded to:

 > 1 | 
 > 2 |   
 > 3 |   require "./binding/binding_linux-gnu-x86_64-qt5.15"
Error: while requiring "./binding/binding_linux-gnu-x86_64-qt5.15"

In lib/qt5/src/qt5/binding/binding_linux-gnu-x86_64-qt5.15.cr:39581:9

 39581 | class Application < GuiApplication
               ^
Error: superclass mismatch for class Qt::Application (Qt::GuiApplication for Reference)
f-fr commented 3 years ago

Well, I didn't change anything at the generated crystal code. If this is is a superclass mismatch ... where is the other definition for this superclass (i.e. there should be another "class Application < ..." somewhere)?

Or are there two different definitions for "GuiApplication"?

Maybe there are multiple generated binding files (i.e. multiple files of the form "binding_linux-gnu-......cr")? These are the generated Crystal files and, AFAICT, the file "binding.cr" includes everything it could find. Maybe deleting all of them and then rerunning the generator solves the problem.

kalinon commented 3 years ago

I'm thinking this may be a change in crystal 1.1 or 1.2. I did not see any definitions of Qt::Application other than 1 in the generator and 1 in src

 39581 | class Application < GuiApplication

and

module Qt
  class Application

I think because the src/qt5/application.cr does not inherit from GuiApplication it is causing the issue.

kalinon commented 3 years ago

Getting past that error leads me to a second:

Called macro defined in lib/qt5/src/qt5/binding.cr:6:1

 6 | {% begin %}

Which expanded to:

 > 1 | 
 > 2 |   
 > 3 |   require "./binding/binding_linux-gnu-x86_64-qt5.15"
Error: while requiring "./binding/binding_linux-gnu-x86_64-qt5.15"

In lib/qt5/src/qt5/binding/binding_linux-gnu-x86_64-qt5.15.cr:69582:48

 69582 | include BindgenHelper::SequentialContainer(Variant)
                                                    ^------
Error: undefined constant Variant
f-fr commented 3 years ago

Really I don't know. I use crystal 1.2.2 and qt 5.15. Did you start with a fresh install? If you refer to my "integrate" branch of qt5.cr from some fresh project's shard.yml and run "shards update" then it should compile qt5.cr with your system's qt dev files. This is what I do and I get no error of that kind.