gnustep / libobjc2

Objective-C runtime library intended for use with Clang.
http://www.gnustep.org/
MIT License
426 stars 116 forks source link

Segfault in SparseArrayLookup; CAAction's protocol version is invalid #283

Open ethanc8 opened 4 months ago

ethanc8 commented 4 months ago

Hello đź‘‹, I have encountered a segfault in SparseArrayLookup during the initialization of my port of the application GitY. Here is the backtrace from GDB:

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff79c8a15 in SparseArrayLookup () from /usr/GNUstep/Local/Library/Libraries/libobjc.so.4.6
(gdb) bt
#0  0x00007ffff79c8a15 in SparseArrayLookup () from /usr/GNUstep/Local/Library/Libraries/libobjc.so.4.6
#1  0x00007ffff79c9369 in objc_send_initialize () from /usr/GNUstep/Local/Library/Libraries/libobjc.so.4.6
#2  0x00007ffff79c9103 in objc_send_initialize () from /usr/GNUstep/Local/Library/Libraries/libobjc.so.4.6
#3  0x00007ffff79d2039 in slowMsgLookup () from /usr/GNUstep/Local/Library/Libraries/libobjc.so.4.6
#4  0x00007ffff79d77dc in objc_msgSend_fpret () from /usr/GNUstep/Local/Library/Libraries/libobjc.so.4.6
#5  0x00007ffff5f2f63a in +[CGImageDestinationTIFF load] (self=0x7ffff5f5b778 <._OBJC_CLASS_CGImageDestinationTIFF>, _cmd=0x7ffff7fab000)
    at image/OPImageCodecTIFF.m:323
#6  0x00007ffff79c552b in objc_send_load_message () from /usr/GNUstep/Local/Library/Libraries/libobjc.so.4.6
#7  0x00007ffff79c60e8 in objc_resolve_class () from /usr/GNUstep/Local/Library/Libraries/libobjc.so.4.6
#8  0x00007ffff79c628d in objc_resolve_class_links () from /usr/GNUstep/Local/Library/Libraries/libobjc.so.4.6
#9  0x00007ffff79cc28c in __objc_load () from /usr/GNUstep/Local/Library/Libraries/libobjc.so.4.6
#10 0x00007ffff5f0be8d in objcv2_load_function () from /usr/GNUstep/Local/Library/Libraries/libopal.so.0
#11 0x00007ffff7fc947e in call_init (l=<optimized out>, argc=argc@entry=1, argv=argv@entry=0x7fffffffd3d8, env=env@entry=0x7fffffffd3e8)
    at ./elf/dl-init.c:70
#12 0x00007ffff7fc9568 in call_init (env=0x7fffffffd3e8, argv=0x7fffffffd3d8, argc=1, l=<optimized out>) at ./elf/dl-init.c:33
#13 _dl_init (main_map=0x7ffff7ffe2e0, argc=1, argv=0x7fffffffd3d8, env=0x7fffffffd3e8) at ./elf/dl-init.c:117
#14 0x00007ffff7fe32ca in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#15 0x0000000000000001 in ?? ()
#16 0x00007fffffffd8a2 in ?? ()
#17 0x0000000000000000 in ?? ()

Interestingly, I have not encountered this when running any of the libs-opal tests, such as images. I have confirmed with GDB that images calls +[CGImageDestinationTIFF load] without segfaulting. Therefore, I don't know in which component this came from.

The Objective-C components that are linked to by GitY are:

It's possible that one of those has messed something up in their +load methods, but I'm not exactly sure. If needed, I can send you binaries of my GNUstep installation, compile libobjc2 with different flags, peek around in GDB, or whatever else would be needed.

ethanc8 commented 1 month ago

The original CAAnimation.m gets CAAction compiled to:

$._OBJC_PROTOCOL_CAAction = comdat any
// a few hundred lines later
@52 = private unnamed_addr constant [9 x i8] c"CAAction\00", align 1
@._OBJC_PROTOCOL_CAAction = global { ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr } { ptr inttoptr (i32 4 to ptr), ptr @52, ptr @.objc_protocol_list.10, ptr @.objc_protocol_method_list.11, ptr @.objc_protocol_method_list.13, ptr @.objc_protocol_method_list.12, ptr @.objc_protocol_method_list.14, ptr null, ptr null, ptr null, ptr null }, section "__objc_protocols", comdat, align 8
// a few lines later
@.objc_protocol_list.20 = internal global { ptr, i64, [4 x ptr] } { ptr null, i64 4, [4 x ptr] [ptr @._OBJC_PROTOCOL_NSCoding, ptr @._OBJC_PROTOCOL_NSCopying, ptr @._OBJC_PROTOCOL_CAAction, ptr @._OBJC_PROTOCOL_CAMediaTiming] }, align 8

I've attached the full LLVM IR to https://gist.github.com/ethanc8/d02ef13441e1eb58d372cf650fa3f63b

davidchisnall commented 1 month ago

That looks correct. The isa pointer is that ptr inttoptr (i32 4 to ptr) one: we're initialising it to 4 in the compiler, so it should be a GNUstepV2 protocol. So now the question is: where does that become 0? Can you put a watchpoint on the load address and see if there are any writes to it?

ethanc8 commented 1 month ago

Can you put a watchpoint on the load address and see if there are any writes to it?

Which load address are you talking about?

ethanc8 commented 1 month ago

Ok, I set a watchpoint at 0x7ffff6ab3a60 (the address of CAAction).

Hardware watchpoint 1: *(int*)0x7ffff6ab3a60

Old value = 4
New value = 0
registerProtocol (proto=0x7ffff6ab3a60 <._OBJC_PROTOCOL_CAAction>) at /home/ethan/Projects/GNUstep/plaurent2/gnustep/libobjc2/protocol.c:608
608             if (protocol_for_name(proto->name) == NULL)
(gdb) 
ethanc8 commented 1 month ago

If I try to forcefully reset it back to 4, it does:

(gdb) set var proto->isa=0x4
(gdb) continue
Continuing.

Hardware watchpoint 1: *(int*)0x7ffff6ab3a60

Old value = 4
New value = -143207376
init_protocols (protocols=0x7ffff6aaa030 <objc_protocol_list>) at /home/ethan/Projects/GNUstep/plaurent2/gnustep/libobjc2/protocol.c:245
245                     if (NULL != aProto->protocol_list)
(gdb) 

and then fails with Unknown protocol version.

davidchisnall commented 1 month ago

Okay, this looks like a load-order problem. The line above the one in your gdb session is proto->isa = protocol_class_gsv2;. If we haven’t yet loaded the protocol classes, this will be null, and that seems to be what you’re seeing.

Most of the paths that should be here are guarded on having loaded the runtime. Can you print a backtrace there? I suspect the simplest thing to do is treat the built-in classes as special and define them in C rather than relying on the normal load sequence.

Does the library defining this protocol explicitly link libobjc2? If it does, we should see the libobjc2 constructors run first (I think).

ethanc8 commented 1 month ago

Here's the backtrace:

Hardware watchpoint 1: *(int*)0x7ffff6ab3a60

Old value = 4
New value = 0
registerProtocol (proto=0x7ffff6ab3a60 <._OBJC_PROTOCOL_CAAction>) at /home/ethan/Projects/GNUstep/plaurent2/gnustep/libobjc2/protocol.c:608
608             if (protocol_for_name(proto->name) == NULL)
(gdb) bt
#0  registerProtocol (proto=0x7ffff6ab3a60 <._OBJC_PROTOCOL_CAAction>) at /home/ethan/Projects/GNUstep/plaurent2/gnustep/libobjc2/protocol.c:608
#1  0x00007ffff774b1a5 in __objc_load (init=0x7ffff6aa9638 <objc_init>) at /home/ethan/Projects/GNUstep/plaurent2/gnustep/libobjc2/loader.c:241
#2  0x00007ffff7fc947e in call_init (l=<optimized out>, argc=argc@entry=1, argv=argv@entry=0x7fffffffd398, env=env@entry=0x7fffffffd3a8) at ./elf/dl-init.c:70
#3  0x00007ffff7fc9568 in call_init (env=0x7fffffffd3a8, argv=0x7fffffffd398, argc=1, l=<optimized out>) at ./elf/dl-init.c:33
#4  _dl_init (main_map=0x7ffff7ffe2e0, argc=1, argv=0x7fffffffd398, env=0x7fffffffd3a8) at ./elf/dl-init.c:117
#5  0x00007ffff7fe32ca in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#6  0x0000000000000001 in ?? ()
#7  0x00007fffffffd88b in ?? ()
#8  0x0000000000000000 in ?? ()
(gdb) p protocol_class_gsv2
$1 = (id) 0x0
ethanc8 commented 1 month ago

All of the libraries and apps should be linking libobjc2:

$ gnustep-config --objc-libs
-ldispatch -pthread -fexceptions -rdynamic -fobjc-runtime=gnustep-2.1 -fblocks -L/home/ethan/GNUstep/Library/Libraries -L/usr/GNUstep/Local/Library/Libraries -L/usr/GNUstep/System/Library/Libraries -lpthread -lobjc -lm
davidchisnall commented 1 month ago

Please can you test #294 and see if this fixes it for you?

ethanc8 commented 1 month ago

Please can you test #294 and see if this fixes it for you?

I will try to do so tonight, but I'm a bit busy today.

ethanc8 commented 1 month ago

Please can you test #294 and see if this fixes it for you?

Yes, it seems to be working now. Thanks! Do you want me to confirm anything in the debugger for you? I have just noticed that it no longer aborts with "Unknown protocol version".

davidchisnall commented 1 month ago

No, that’s great. Thanks.

ethanc8 commented 1 month ago

We still need to figure out what caused the segfault in SparseArrayLookup.