JuliaHubOSS / llvm-cbe

resurrected LLVM "C Backend", with improvements
Other
826 stars 141 forks source link

Zero-length arrays (e.g. from C99 flexible array members) not supported #123

Open hikari-no-yume opened 3 years ago

hikari-no-yume commented 3 years ago

In C99 you can put an array at the end of a struct with no size specified, a so-called “flexible array member”. In C89, I think some compilers support specifying an explicit size of 0 as an extension.

Here's a C program demonstrating it:

#include <stddef.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>

typedef struct {
    size_t length;
    char content[];
} string;

string *make_string(size_t length, const char content[length])
{
    size_t size = length + 1;

    string *s = malloc(offsetof(string, content) + size);
    if (!s)
    {
        return NULL;
    }

    s->length = length;
    memcpy(s->content, content, length);
    s->content[length] = '\0';

    return s;
}

void print_string(const string *s)
{
    printf("%.*s\n", (int)s->length, s->content);
}

int main(int argc, char *argv[])
{
    if (argc != 2)
    {
        fprintf(stderr, "Incorrect argument count.\n");
        return 1;
    }

    string *s = make_string(strlen(argv[1]), argv[1]);
    print_string(s);
}

When compiled with clang -O1 (haven't tested anything else), this results in LLVM IR with a zero-sized array in the struct, which is GEP'd. Because we don't emit struct members with zero-sized types, the CBE C output for this doesn't compile.

I probably won't fix this bug any time soon since it's rather C-specific, and I'm mostly interested in compiling non-C languages to C. But I suppose a similar pattern could appear in another language's LLVM IR. I'm reporting this just for completeness really.

vtjnash commented 3 years ago

Ah, interesting, yes that seems like another particular and peculiar exception to the way these have been handled. Since there isn't something similar in C89, we might need to handle a GEP of these values (at any point in the struct) as being special: they are a GEP of the previous value + sizeof the previous value + re-alignment. Using the address of the next value might add in padding instead that shouldn't have been present in the address computation. (similarly, omitting these might currently be losing padding due to zero-byte alignments, but that seems unlikely someone would be observing and depending on that)

hikari-no-yume commented 3 years ago

When targeting pure C89, one way to achieve the same thing is to use a single-element array instead. We could do that, but this would only work if this struct isn't included in any other structs, and only for the final member of the struct…