cda-group / arc

Programming Language for Continuous Deep Analytics
https://cda-group.github.io/arc/
44 stars 6 forks source link

Rust codegen for extern functions #362

Closed segeljakt closed 2 years ago

segeljakt commented 2 years ago

In the Arc standard library, we have:

extern type Cell[T];

extern def cell[T](T): Cell[T];
extern def update[T](Cell[T], T);
extern def read[T](Cell[T]): T;

On the Rust side we have correspondingly:

fn cell<T>(x: T, _: Context) -> Cell<T> {
    Cell { data: Rc::new(x) }
}

If one uses a cell to store i32, then we get:

func private @cellsi32(%_0: si32) -> !arc.adt<"Cell<i32>">

This means the rest of the program expects there to be a Rust function named cellsi32, but this function does not exist, so any reference to it will result in a compilation error 😬 . A solution is to add the possibility of annotating a function with its unmangled identifier, for example:

func private @cellsi32(%_0: si32) -> !arc.adt<"Cell<i32>"> {unmangled = cell}

Then we could generate Rust code such as this, which will delegate the call to the unmangled generic function.

#[rewrite]
#[inline(always)]
fn cellsi32(_0: i32) -> Cell<i32> {
    call!(cell(_0))
}

We could alternatively generate:

#[rewrite(unmangled = cell)]
extern fn cellsi32(i32) -> Cell<i32>;

And then expand it to the previous code.

frej commented 2 years ago

Does this issue have the right title? Isn't it about being able to add arbitrary annotations to function declarations, rather than calling external functions?

[edit: found out that extern is Rust for C-ABI functions, so no complaints on the title]

If we add two new attributes arc.external and arc.annotation

func private @cellsi32(%_0: si32) -> !arc.adt<"Cell<i32>">
    attributes {rust.external, rust.annotation="#[rewrite(unmangled = cell)]"}

can give us:

#[rewrite(unmangled = cell)]
extern fn cellsi32(i32) -> Cell<i32>;
segeljakt commented 2 years ago

Ok, it is also possible to generate:

#[rewrite(unmangled = cell)]
fn cellsi32(i32) -> Cell<i32> {}

The only requirement is that it must be parseable by Rust. I just put extern because then we can omit the function body.

frej commented 2 years ago

So

func private @cellsi32(%_0: si32) -> !arc.adt<"Cell<i32>">
    attributes {rust.annotation="#[rewrite(unmangled = cell)]"} {}

Which produces:

#[rewrite(unmangled = cell)]
fn cellsi32(i32) -> Cell<i32> {}

Will be adequate?

segeljakt commented 2 years ago

That will work well 👍

segeljakt commented 2 years ago

One thing is that we might need to quote the identifier so it becomes:

func private @cellsi32(%_0: si32) -> !arc.adt<"Cell<i32>">
    attributes {rust.annotation="#[rewrite(unmangled = \"cell\")]"} {}
segeljakt commented 2 years ago

I tried out the code generation part. It seems like

#[rewrite(unmangled = "cell")]
fn cellsi32(i32) -> Cell<i32> {}

is the best approach.

The other does not compile:

#[rewrite(unmangled = cell)]
extern fn cellsi32(i32) -> Cell<i32>;

I forgot that you would need a block:

#[rewrite(unmangled = cell)]
extern {
    fn cellsi32(i32) -> Cell<i32>;
}
frej commented 2 years ago

I had a brain wave overnight. The functionality is already there without any need for rewrite macros or inlining directives. Define cellsi32 as an MLIR external function:

func private @cellsi32(%_0: si32) -> !arc.adt<"Cell<i32>">
    attributes {arc.rust_name="cell"}

Whenever you call cellsi32 you do:

%r = call @cellsi32(%0) : (si32) -> !arc.adt<"Cell<i32>">

and you will get the Rust

let v1:Cell<i32> = call!(cell(val!(v0), ));
segeljakt commented 2 years ago

wow!

frej commented 2 years ago

There is a bug when the external function is used in a constant, #363 will fix that.

segeljakt commented 2 years ago

Is it possible to "instantiate" the function? for example:

func private @cellsi32(%_0: si32) -> !arc.adt<"Cell<i32>">
    attributes {arc.rust_name="cell::<i32>"}

and generate

let v1:Cell<i32> = call!(cell::<i32>(val!(v0), ));

I'm trying to think how it will work if we want to create a function pointer.. 🤔

frej commented 2 years ago

Whatever is in the attribute gets passed on to Rust, so

func private @cellsi32(%_0: si32) -> !arc.adt<"Cell<i32>">
  attributes {arc.rust_name="cell::<i32>"}

will get you:

let v1:Cell<i32> = call!(cell::<i32>(val!(v0), ));

Function pointers work too:

  func @caller1(%0: si32) -> !arc.adt<"Cell<i32>"> {
    %f = constant @cellsi32 : (si32) -> !arc.adt<"Cell<i32>">
    %r = call_indirect %f(%0) : (si32) -> !arc.adt<"Cell<i32>">
    return %r : !arc.adt<"Cell<i32>">
  }

will give you (if folding is turned off)

pub fn caller1(v2: i32) -> Cell<i32> {
  let v3 : function!((i32,) -> Cell<i32>) = cell::<i32>;
  let v4:Cell<i32> = call_indirect!((val!(v3))(val!(v2), ));
  return val!(v4);
}

Will we need some extra Rust syntax for that?

segeljakt commented 2 years ago

When we create the function pointer we need to use function!(...)

pub fn caller1(v2: i32) -> Cell<i32> {
  let v3 : function!((i32,) -> Cell<i32>) = function!(cell::<i32>);
  let v4:Cell<i32> = call_indirect!((val!(v3))(val!(v2), ));
  return val!(v4);
}

This will map the function to an entry in a function pointer table:

enum Tag {
    cell,
    caller1,
}
struct FunctionTag<I, O>(Tag, PhantomData<I, O>);
struct Function<I, O>(fn(I) -> O, FunctionTag<I, O>);

If we are given a tag, it should be possible to decode it into the corresponding physical function pointer:

fn decode<I,O>(tag: FunctionTag<I, O>) -> Function<I, O> {
    unsafe {
        match tag.0 {
            Tag::cell => Function(std::mem::transmute(cell as usize), tag), // ERROR: Type annotations needed on `cell`
            Tag::caller1 => Function(std::mem::transmute(caller1 as usize), tag),
        }
    }
}

Physical function pointers always point to an instantiated function. I mean, it's not possible to have a generic function pointer. If our logical function tag is generic, then we don't know which instance the physical function pointer should be decoded into.

One solution is to make function tags name mangled:

fn decode<I,O>(tag: FunctionTag<I, O>) -> Function<I, O> {
    unsafe {
        match tag.0 {
            Tag::celli32 => Function(std::mem::transmute(celli32 as usize), tag),
            Tag::caller1 => Function(std::mem::transmute(caller1 as usize), tag),
        }
    }
}

Another solution could be to use strings, though probably this has worse performance:

fn decode<I,O>(tag: FunctionTag<I, O>) -> Function<I, O> {
    unsafe {
        match tag.0 {
            "cell::<i32>" => Function(std::mem::transmute(cell::<i32> as usize), tag),
            "caller1" => Function(std::mem::transmute(caller1 as usize), tag),
        }
    }
}
frej commented 2 years ago

When we create the function pointer we need to use function!(...)

364 should fix that.

segeljakt commented 2 years ago

I forgot to say, the idea of the logical tag is to preserve information about what a function pointer points to. This is needed when function pointers are transferred between OS processes (which can have different memory address space layouts). Functions are values so we need to create the logical tag simultaneously when creating the physical pointer, or else we might lose this information 👀