crystal-lang / crystal

The Crystal Programming Language
https://crystal-lang.org
Apache License 2.0
19.47k stars 1.62k forks source link

Crash compiling code that just require xml without using it. #10435

Open hugopl opened 3 years ago

hugopl commented 3 years ago

The following code crashes with the following stacktrace:

require "xml"

@[Link("libvirt")]
lib LibVirt
  alias ConnectPtr = Void*

  fun connect_open = virConnectOpen(name : LibC::Char*) : ConnectPtr
  fun connect_close = virConnectClose(conn : ConnectPtr) : Int32
end

ptr = LibVirt.connect_open("qemu:///system")
LibVirt.connect_close(ptr)
#0  0x00007ffff758cef5 in raise () at /usr/lib/libc.so.6
#1  0x00007ffff7576862 in abort () at /usr/lib/libc.so.6
#2  0x00007ffff75cef38 in __libc_message () at /usr/lib/libc.so.6
#3  0x00007ffff75d6bea in  () at /usr/lib/libc.so.6
#4  0x00007ffff75d7fbc in _int_free () at /usr/lib/libc.so.6
#5  0x00007ffff75dbca8 in free () at /usr/lib/libc.so.6
#6  0x00007ffff7d7793b in  () at /usr/lib/libvirt.so.0
#7  0x00007ffff7d7c8a0 in  () at /usr/lib/libvirt.so.0
#8  0x00007ffff7e13155 in  () at /usr/lib/libvirt.so.0
#9  0x00007ffff7e138f3 in virConnectOpen () at /usr/lib/libvirt.so.0
#10 0x000055555558dc29 in __crystal_main () at /home/hugo/src/xmlbug/src/xmlbug.cr:11
#11 0x0000555555606176 in main_user_code () at /usr/lib/crystal/crystal/main.cr:110
#12 0x0000555555605fec in main () at /usr/lib/crystal/crystal/main.cr:96
#13 0x00005555555990a6 in main () at /usr/lib/crystal/crystal/main.cr:119

If I just comment the first line, require "xml", the code works correctly and no crash happens. The weird thing is that no XML code is explicit used by the example.

Tested with:

Crystal 0.36.1 (2021-02-03)

LLVM: 10.0.1
Default target: x86_64-pc-linux-gnu

libvirt 7.0.0-3 from ArchLinux package.
bcardiff commented 3 years ago

Requiring xml is enough to use it. It has some main initialization. Check references to LibXML.xmlGcMemSetup to find it.

hugopl commented 3 years ago

I guess I found the issue... libvirt also uses libxml2, so I think the library is getting double initied or something.

hugopl commented 3 years ago

yep, I reduced it to:

require "xml"

{% if compare_versions(Crystal::VERSION, "0.35.0-0") >= 0 %}
  @[Link("xml2", pkg_config: "libxml-2.0")]
{% else %}
  @[Link("xml2")]
{% end %}
lib LibXML
  fun xmlGcMemSetup(free_func : Void* ->,
                    malloc_func : LibC::SizeT -> Void*,
                    malloc_atomic_func : LibC::SizeT -> Void*,
                    realloc_func : Void*, LibC::SizeT -> Void*,
                    strdup_func : UInt8* -> UInt8*) : Int
end

LibXML.xmlGcMemSetup(
  ->GC.free,
  ->GC.malloc(LibC::SizeT),
  ->GC.malloc(LibC::SizeT),
  ->GC.realloc(Void*, LibC::SizeT),
  ->(str) {
    len = LibC.strlen(str) + 1
    copy = Pointer(UInt8).malloc(len)
    copy.copy_from(str, len)
    copy
  }
)

 @[Link("libvirt")]
 lib LibVirt
   alias ConnectPtr = Void*

   fun connect_open = virConnectOpen(name : LibC::Char*) : ConnectPtr
   fun connect_close = virConnectClose(conn : ConnectPtr) : Int32
 end

 ptr = LibVirt.connect_open("qemu:///system")
 LibVirt.connect_close(ptr)

So... it's not a Crystal bug, but IMO it deservers a mention on XML module documentation.

hugopl commented 3 years ago

I was able to use libvirt with Crystal libxml2 bindings by doing this workaround:

require "xml"

@[Link("libvirt")]
lib LibVirt
  alias ConnectPtr = Void*

  fun connect_open = virConnectOpen(name : LibC::Char*) : ConnectPtr
  fun connect_close = virConnectClose(conn : ConnectPtr) : Int32
end

def no_xmlgc
  LibXML.xmlGcMemSetup(->LibC.free,
    ->LibC.malloc(LibC::SizeT),
    ->LibC.malloc(LibC::SizeT),
    ->LibC.realloc(Void*, LibC::SizeT),
    ->(str) {
      len = LibC.strlen(str) + 1
      copy = Pointer(UInt8).malloc(len)
      copy.copy_from(str, len)
      copy
    })
  yield
  LibXML.xmlGcMemSetup(->GC.free,
    ->GC.malloc(LibC::SizeT),
    ->GC.malloc(LibC::SizeT),
    ->GC.realloc(Void*, LibC::SizeT),
    ->(str) {
      len = LibC.strlen(str) + 1
      copy = Pointer(UInt8).malloc(len)
      copy.copy_from(str, len)
      copy
    })
end

no_xmlgc do
  ptr = LibVirt.connect_open("qemu:///system")
  pp! LibVirt.connect_close(ptr)
end

pp! XML.parse("<test />")

Works... but is bad, because on multiple threads I believe that this need to be protected by a mutex or the caos will reign.... there's also the little overhead of 2 function calls.

So I think the correct to do this is let Crystal XML bindings don't use the GC, but manage its memory on some finalize calls, otherwise the GC will once again block the use a C library with Crystal, anyone agree?

hugopl commented 3 years ago

The use case were I found this was rewriting a small ruby JSON-RPC server used by the company I'm working to Crystal, the application uses libvirt and also threads, in Ruby it need to use multiple process sometimes (and it's a hell) to keep the server responsive during some slow libvirt operations, the same might happen on Crystal if using fibers in the same thread, anyway.... as the XML use I do is really simple, I can workaround it by calling LibXML.xmlNodeFree myself on Xml::Node like:

require "xml"

lib LibXML
  fun xmlFreeNode(cur : Node*) : Void
end

module XML
  struct Node
    def free
      LibXML.xmlFreeNode(@node)
    end
  end
end

@[Link("libvirt")]
lib LibVirt
  alias ConnectPtr = Void*

  fun connect_open = virConnectOpen(name : LibC::Char*) : ConnectPtr
  fun connect_close = virConnectClose(conn : ConnectPtr) : Int32
end

LibXML.xmlGcMemSetup(->LibC.free, ->LibC.malloc(LibC::SizeT), ->LibC.malloc(LibC::SizeT), ->LibC.realloc(Void*, LibC::SizeT), ->(str) {
  len = LibC.strlen(str) + 1
  copy = Pointer(UInt8).malloc(len)
  copy.copy_from(str, len)
  copy
})

ptr = LibVirt.connect_open("qemu:///system")
pp! LibVirt.connect_close(ptr)

node = XML.parse("<test />")
node.free

Doing so I noticed that the XML in stdlib classes are structs (probably for performance issues), not classes, so they don't have a finalize method and therefore my suggestion about let Crystal XML module manage the libXML2 memory can't be done without changing this.

straight-shoota commented 3 years ago

The finalize hook would need to be on the libxml2 data structur (LibXML::Node in crystal). I suppose if you use Crystal's GC you could initialize that using LibGC.register_finalizer_ignore_self.

But the best solution would probably be to let libvirt use the libxml2 configuration from Crystal. Because the main program runs in Crystal's runtime, libraries should adopt to that, not the other way around.

hugopl commented 3 years ago

I think the best solution you mentioned isn't feasible, since most libraries that use libxml2 have no way to configure that and don't use a GC, calling the libXML2 free functions themselves...

Even if we convince all projects using libXML2 to fill their projects with #ifdef to avoid call the libXML2 free functions this would not be so useful since no distro would package the library with a compile flag that make it leak memory if not used with a GC.

As I need the xml module in my application that uses libvirt, I'll end up with the worst solution... fork the xml module and apply these minor patches replacing struct by class and adding some finalize calls, at least the module isn't so big.

straight-shoota commented 3 years ago

I just looked how this works in Ruby with nokogiri and libvirt-ruby. It seems nokogiri does not set up libxml2 to use the GC. I suppose they do the same as your patch and free libxml2 memory when the Ruby objects are garbage-collected. That's probably a better solution in general than hooking up libxml2 with the GC directly. Most (or all?) other libc bindings in stdlib work like this, too.

So I suggest we should apply your patch.

hugopl commented 3 years ago

There's no patch ready yet :stuck_out_tongue:, but I can write one. The time to do this is now... since the changes from struct -> class is a API breakage. But unless someone is doing something very weird the change is source-compatible and no good code should stop compiling.

asterite commented 3 years ago

Why does LibXML.xmlFreeNode(@node) need to be called manually if xmlGcMemSetup is already set up to call LibC.free?

hugopl commented 3 years ago

xmlGcMemSetup just tell what free function should be called by LibXML.xmlFreeNode when freeing the resources, so LibXML.xmlFreeNode will call LibC.free.

xmlGcMemSetup implementation just setting a pointer do xmlFree global. https://github.com/GNOME/libxml2/blob/46837d47d59c7b8c9bd1d08a6a717a90a7f1ceb6/xmlmemory.c#L1107

xmlFreeNode, calling xmlFree https://github.com/GNOME/libxml2/blob/01411e7c5ea0fff181271e092f46a2138c3720ec/tree.c#L3775

hugopl commented 3 years ago

BTW, the fix isn't just change struct by class and add some finalizers... since XML::Node class holds a LibXML::Doc*, LibXML::Doc* or a LibXML::Attr*, so if the doc is deleted, the nodes will be deleted in cascade by libXML and futher xmlFreeNode calls will be invalid... the same way if a node is deleted then when I delete the doc I get a double free.

i.e. The fix isn't soooo simple as I expected.

asterite commented 3 years ago

That's the beauty of being able to hook the libxml memory functions to a GC which takes care of all of this for us 😅

hugopl commented 3 years ago

One patch that could be in 1.0.0 is to keep the GC taking care of libXML as it is now and just change the structs to class, so a patch that deals with these memory deallocations can be landed later in a 1.0.1 whatever without break any API... because for sure some finalizers will be used in the solution. I mean, all this depending on time constraints about the 1.0.0 release date.

bcardiff commented 3 years ago

I think the change from struct to class can happen now and later we can figure ways to setup the GC.

hugopl commented 3 years ago

This MR was easy to do :-)

paulocoghi commented 2 years ago

@hugopl , were you able to use libvirt directly on Crystal, without Ruby?

hugopl commented 2 years ago

Yes, it's possible to use, you just can't use/include the XML module from stdlib, since it register hooks to free the memory using the GC.. then when libvirt tries to also free the memory it causes a double free.