gnustep / libobjc2

Objective-C runtime library intended for use with Clang.
http://www.gnustep.org/
MIT License
440 stars 119 forks source link

objc4 class structure incompatibilities and Swift interop considerations #306

Open hmelder opened 5 days ago

hmelder commented 5 days ago

I am currently looking into the implementation details of the Objective-C interoperability in the Swift compiler and runtime, and it turns out that we have a bit of a problem with the layout of objc_class.

objc4 splits a class into three structures: objc_class, class_ro_t, and class_rw_t. objc_class is the one embedded into the swift class metadata described below.

Embedding the existing libobjc2 class structure will be difficult as it will probably break the contract between compiler and runtime in Swift. However, changing objc_class breaks the existing ABI in libobjc2.

@davidchisnall there is probably no other way then to have an -fobjc-runtime=gnustep-3.0, or we make use of the version number field (probably very ugly). This should still be a lot less effort than porting objc4 due to their dependence on Mach-O.

Common Metadata Layout

Assuming sizeof(void *) == 8:

Offset Description
-8 Value Witness Table
0 Kind

Class Metadata Layout

The class metadata is of kind 0 on non-apple platforms, and a ISA pointer to an ObjC metaclass otherwise. There are 5 words reserved for objc_class starting with offset 0, exactly the size of the new objc_class.

From docs/ABI/TypeMetadata.rst in swiftlang/swift

Class metadata is designed to interoperate with Objective-C; all class metadata
records are also valid Objective-C ``Class`` objects. Class metadata pointers
are used as the values of class metatypes, so a derived class's metadata
record also serves as a valid class metatype value for all of its ancestor
classes.

- The **destructor pointer** is stored at **offset -2** from the metadata
  pointer, behind the value witness table. This function is invoked by Swift's
  deallocator when the class instance is destroyed.
- The **isa pointer** pointing to the class's Objective-C-compatible metaclass
  record is stored at **offset 0**, in place of an integer kind discriminator.
- The **super pointer** pointing to the metadata record for the superclass is
  stored at **offset 1**. If the class is a root class, it is null.
- On platforms which support Objective-C interoperability, two words are
  reserved for use by the Objective-C runtime at **offset 2** and **offset
  3**; on other platforms, nothing is reserved.
- On platforms which support Objective-C interoperability, the **rodata 
  pointer** is stored at **offset 4**; on other platforms, it is not present. 
  The rodata pointer points to an Objective-C compatible rodata record for the 
  class. This pointer value includes a tag.
  The **low bit is always set to 1** for Swift classes and always set to 0 for
  Objective-C classes.
- The **class flags** are a 32-bit field at **offset 5** on platforms which 
  support Objective-C interoperability; on other platforms, the field is at 
  **offset 2**.
 [...]

From runtime/objc-runtime-new.h in objc4

 #if __LP64__
typedef uint32_t mask_t;  // x86_64 & arm64 asm are less efficient with 16-bits
#else
typedef uint16_t mask_t;
#endif
typedef uintptr_t cache_key_t;

struct bucket_t {
private:
    cache_key_t _key;
    IMP _imp;

public:
    inline cache_key_t key() const { return _key; }
    inline IMP imp() const { return (IMP)_imp; }
    inline void setKey(cache_key_t newKey) { _key = newKey; }
    inline void setImp(IMP newImp) { _imp = newImp; }

    void set(cache_key_t newKey, IMP newImp);
};

struct cache_t {
    struct bucket_t *_buckets;
    mask_t _mask;
    mask_t _occupied;
}

struct objc_class : objc_object {
    // Class ISA;
    Class superclass;
    cache_t cache;             // formerly cache pointer and vtable
    class_data_bits_t bits;    // class_rw_t * plus custom rr/alloc flags
   [...]
}

struct class_ro_t {
    uint32_t flags;
    uint32_t instanceStart;
    uint32_t instanceSize;
#ifdef __LP64__
    uint32_t reserved;
#endif

    const uint8_t * ivarLayout;

    const char * name;
    method_list_t * baseMethodList;
    protocol_list_t * baseProtocols;
    const ivar_list_t * ivars;

    const uint8_t * weakIvarLayout;
    property_list_t *baseProperties;

    method_list_t *baseMethods() const {
        return baseMethodList;
    }
};
hmelder commented 2 days ago

Looking further into Swift's code gen for ObjC, I am getting a bit more pessimistic as there a dozen files where internal ObjC structures are accessed directly. A lot of abstraction leakage to the point were Swift classes are essentially a super set of objc_class without a dtable. Not sure if they even use the same sidetable for weak references.

Options are:

  1. Creating a new libobjc2 ABI that splits objc_class into two structures, provides facilities for weak ref interop, and initialisation in a preallocated region (+ all the changes mirrored to CGObjCGNU in clang and Swift Codegen)
%struct._class_t = type { ptr, ptr, ptr, ptr, ptr }
%struct._class_ro_t = type { i32, i32, i32, ptr, ptr, ptr, ptr, ptr, ptr, ptr }
  1. Try to increase swift object size to fit existing objc_class (will horribly fail because of all the contracts between stdlib, runtime, and swiftc)
  2. Port objc4 to ELF and abstract over dyld and Mach-O specific parts to get the SPIs for Swift interop. This would essentially be a complete rewrite of objc4 and introduce a new ABI as the objc4 load function completely depends on a Mach-O header. And this would still require major changes in Swift interop. So not worth it compared to 1.
  3. Calling it a day