chsasank / llama.lisp

Lisp dialect designed for HPC and AI
GNU Lesser General Public License v2.1
15 stars 6 forks source link

Linking a function declaration and definition with different argument types works! #22

Open GlowingScrewdriver opened 5 months ago

GlowingScrewdriver commented 5 months ago

This is exploited in one of the tests to be able to print a boolean value using runtime function print, which is defined to take an integer argument in runtime.c. Turns out, it works just fine! Boolean true and false get printed as 1 and 0:

(brilisp
    (bril-define ((print bool) (b bool)))

    (bril-define ((main void))
        (set (v1 float) (const 50.0))
        (set (v2 float) (const 50.1))

        (set (res bool) (feq v1 v2))
        (set (tmp bool) (call print res))
        ;; ...
        (ret)))

It is interesting to note that until link time, runtime.c and the brilisp program are compiled independently of each other. Thus, there is no sharing of info between the two compilation processes, until link time. It is only after shared objects are assembled that the linker looks for the definition of print, finds it in the object produced by runtime.c, and links the function calls in the brilisp program to it.

Turns out, shared objects do not hold information about function types and parameters. This is why there is no error produced at any stage. There is no way to determine that the two declarations of print have different types.

Additionally, in a C program compiled for x86, when the stack is used for parameter passing, each parameter takes up one word on the stack, regardless of whether it is a char, int, or something else (of course it would get a little more complicated if structs were being passed). This is what prevents junk from entering the higher bits of the number when it is passed to print. For example, the following function call in C:

    char a = 3;
    int b = 4;
    fn (1, 2, 3, 4, 5, 6, 7, a, b, a, b);

gets converted to

    movl    $1, %edi
        ; a few more registers
    movl    $7, (%rsp) ; here onwards, stack is used
    movl    %ebx, 8(%rsp)
    movl    %r11d, 16(%rsp)
    movl    %r10d, 24(%rsp)
    movl    %eax, 32(%rsp)
    movb    $0, %al
    callq   fn@PLT

In case it isn't clear, this is not really a bug. It's just interesting behavior, that is permitted by something at the linker level and lower.