JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.8k stars 5.49k forks source link

mini julep: C ABI for inline structs and unions #33120

Open c42f opened 5 years ago

c42f commented 5 years ago

We have changed ABIs for Vectors of Union and struct more than once now, (see #32448 and #23577). The latest change got me thinking again and hoping for a stable, interoperable and efficient ABI for these things. This would make Union{Missing,T} work better in more cases, help with #26309, https://github.com/RelationalAI-oss/Blobs.jl/issues/2, possibly #29289, etc.

The idea is that the data part of nested struct-and-union should follow the C ABI when stored inline, with all type selector bits hoisted outward to be stored separately from the data. For Arrays, the hoisting would move selectors to the end of the array as we currently do for Union{Missing,T}.

This is a natural generalization of the Array-of-Union layout introduced by #23577. In the same way, it allows for memory efficient storage of the data fields and selector bytes by respecting the natural alignment of each. It provides a simple general rule for writing C data structures which alias Julia data.

As a concrete example of this hoisting, here's some structs and unions written out in terms of C data structures.

#include <stdint.h>
#include <stdio.h>
#include <string.h>

// Type for selector bits
typedef uint8_t Selector;

// Some structs containing unions
//
// struct X
//     f::Union{UInt8,Float64}
// end
// 
// struct Y
//     f::Union{UInt8,Int32}
// end

// C representation for these with selector byte split out separately
struct X { union { uint8_t _1; double   _2; } f; };
struct Y { union { uint8_t _1; uint64_t _2; } f; };

struct X_selectors { Selector sel_f; };
struct Y_selectors { Selector sel_f; };

// Aggregate of structs-containing-Unions
//
// struct A
//     x::X
//     y::Y
// end

struct A {
    struct X x;
    struct Y y;
};

// Aggregate of selector bytes mirrors original struct layout
struct A_selectors {
    struct X_selectors x;
    struct Y_selectors y;
};

// A more complicated example containing Unions of structs with unions.
//
// struct D
//     x::X
//     xy::Union{X,Y}
// end

struct D {
    struct X x;
    union {
        struct X _1;
        struct Y _2;
    } xy;
};

struct D_selectors {
    struct X_selectors x;
    union {
        struct X_selectors _1;
        struct Y_selectors _2;
    } xy;
    Selector sel_xy;
    // Should `xy` or `sel_xy` come first here?
};

// Dump binary representation of data at p.
void hexdump(const char* name, void* p, size_t s) {
    uint8_t* c = (uint8_t*)p;
    printf("%s = ", name);
    for (size_t i = 0; i < s; ++i) {
        printf("%02x", c[i]);
        if ((i+1) % 4 == 0)
            printf(" ");
    }
    printf("\n");
};

int main()
{
    struct A a;
    struct A_selectors a_sel;
    memset(&a, '\0', sizeof(a));
    memset(&a_sel, '\0', sizeof(a_sel));

    // a = A(123.123, 0xff)
    a.x.f._2 = 123.123;
    a_sel.x.sel_f = 2;   // a.x isa Float64
    a.y.f._1 = 0xff;
    a_sel.y.sel_f = 1;   // a.y isa UInt8

    hexdump("a", &a, sizeof(a));
    hexdump("a_sel", &a_sel, sizeof(a_sel));

    struct D d;
    struct D_selectors d_sel;
    memset(&d, '\0', sizeof(d));
    memset(&d_sel, '\0', sizeof(d_sel));

    // d = D(X(123.123), Y(0x11223344))
    d.x.f._1 = 0xff;
    d_sel.x.sel_f = 1;      // d.x.f  isa UInt8

    d.xy._2.f._2 = 0x1122334455667788;
    d_sel.sel_xy = 2;       // d.xy   isa Y
    d_sel.xy._1.sel_f = 2;  // d.xy.f isa UInt64

    hexdump("d", &d, sizeof(d));
    hexdump("d_sel", &d_sel, sizeof(d_sel));

    // Layout for Vector{A} a combined allocation of `as` and `as_sel`
    struct A as[2];
    struct A_selectors as_sel[2];
    memset(&as, '\0', sizeof(as));
    memset(&as_sel, '\0', sizeof(as_sel));

    // as = A[A(X(123.123), Y(0xff)), A(X(0xff), Y(0x1122334455667788))]
    // as[1] = A(X(123.123), Y(0xff))
    as[0].x.f._2 = 123.123;
    as_sel[0].x.sel_f = 2;   // as[1].x isa Float64
    as[0].y.f._1 = 0xff;
    as_sel[0].y.sel_f = 1;   // as[1].y isa UInt8
    // as[2] = A(X(0xff), Y(0x1122334455667788))
    as[1].x.f._1 = 0xff;
    as_sel[1].x.sel_f = 1;   // as[1].x isa UInt8
    as[1].y.f._2 = 0x1122334455667788;
    as_sel[1].y.sel_f = 2;   // as[1].y isa UInt64

    hexdump("as", &as, sizeof(as));
    hexdump("as_sel", &as_sel, sizeof(as_sel));

    return 0;
};

Program output:

$ g++ julia_selectors.c && ./a.out 
a = 1d5a643b dfc75e40 ff000000 00000000 
a_sel = 0201
d = ff000000 00000000 88776655 44332211 
d_sel = 010202
as = 1d5a643b dfc75e40 ff000000 00000000 ff000000 00000000 88776655 44332211 
as_sel = 02010102

@vtjnash @quinnj this is a more complete explanation of my vague thought-bubble from https://github.com/JuliaLang/julia/pull/32448#issuecomment-507010029

andyferris commented 5 years ago

I completely agree with and support that we should stabilize a C-compatible (and friendly) ABI for Unions in invariant positions (and possibly covariant ones too, eventually?).

The proposal in the OP seems consistent with what we currently do for arrays... but I also wonder if it would be nice to support other layouts (I’m speculating here that tricks like we do for VecElement might help define a different layout to choose when we are interoperating with C code?)

c42f commented 5 years ago

Regarding other layouts, I'm not sure that's the business of the compiler? People can already define libraries like StructArrays.jl to optimize storage layout for the particular purposes of their application while viewing the data in a more convenient form.