cmajor-lang / cmajor

The Cmajor public repository
https://cmajor.dev
Other
534 stars 31 forks source link

Solution to API language interoperability #35

Closed 0xchase closed 7 months ago

0xchase commented 7 months ago

Happy to see the compiler has been open sourced!

I was looking into generating bindings to the cmajor API to interop with other languages and ran into a few hurdles. Since API/ is all C++ I assumed the best approach would be to build bindings to COM/, but it seems that this interface is implemented in C++ as well, which would preclude automatic binding generation with something like bindgen or ffigen. Furthermore, it looks like the COM interface returns C++ data structures like choc::com::Ptr<ProgramInterface>, which again would make interop complicated and potentially bug-prone.

Obviously, I could implement my own C wrapper or reverse engineer the in-memory representation of the data structures returned by the COM API, but it seems to me like having an included C API would benefit the community. If it was me I would have designed the COM API to return C types and added the choc C++ abstractions in API/, but perhaps there's a reason I'm not aware of that the compiler must return smart pointers, or perhaps you all had a better approach to interop in mind.

If there's interest in adding a C API I'd be happy to implement it with some guidance on the design. If there's a better approach to interop, let me know that as well.

julianstorer commented 7 months ago

Hi there, great that you'd like to get stuck in!

Which languages are you aiming to target?

You won't find any C fans over here, and the idea of adding a clunky C API that we'd need to keep in sync with our C++ one makes me shudder..

But.. I've never used bindgen or ffgen, and my instinct here would be that since our COM classes just follow the standard pattern, then it probably wouldn't take much of a tweak to make our classes compatible with those tools.

Don't get confused by things like choc::com::Ptr<ProgramInterface> - we have helper classes that use smart pointers like that in the cmajor/API folder, but those are just wrappers around the raw COM classes in the cmajor/COM folder, and those are all traditional bare pointers with AddRef/Release semantics etc., and if it takes a little bit of tweaking their shape to make them fit the pattern that other tools expect, that's something that we'd be happy to change

0xchase commented 7 months ago

I'm primarily looking to target Rust and possibly Dart (for use with the Flutter framework).

Ah okay, I see. You're right, I was confused by how the smart pointers worked. I assumed it was a fat pointer but now that I've had a closer look at the implementation I see how the reference counting works. So it shouldn't be a problem to implement the Rust interface. I'll let you know if I run into any issues.

julianstorer commented 7 months ago

Thanks! Good luck, will be curious to see how you get on..

0xchase commented 7 months ago

TLDR: This turned out to be a bit more of a mess than I expected and I think trying to bind to the existing COM interface is probably not a good solution.


It's fairly straightforward to load the cmajor dynamic library, find the cmajor_getEntryPoints function, and call the static functions in the Library class. As I start to interact with objects like the ProgramInterface, however, it starts to get a bit complicated.

Using the ProgramInterface as an example, I can manually construct a data structure that should correspond to the in-memory representation of the its vtable, as shown below (in Rust).

#[repr(C)]
pub struct ProgramInterfaceVtable {
    pub parse: unsafe fn (
        *const ProgramInterface,
        filename: *const i8,
        file_content: *const i8,
        file_content_size: usize
    ) -> *mut ChocString,
    pub get_syntax_tree: unsafe fn (
        *const ProgramInterface,
        &SyntaxTreeOptions
    ) -> *mut ChocString
}

And then the in-memory representation of the object itself, which should start with the choc::com::Object vtable it inherits from, followed by its own vtable.

#[repr(C)]
pub struct ObjectVtable {
    add_ref: fn (*const Object) -> i32,
    release: fn (*const Object) -> i32,
    get_reference_count: fn(*const Object) -> i32,
}

#[repr(C)]
pub struct Object; // Empty

pub struct ProgramInterface {
    object_vtable: *const ObjectVtable,
    vtable: *const ProgramInterfaceVtable
}

And so on, as I started to do [https://github.com/0xchase/cmajor-rs/tree/main/src/com]. The point of all these data structures is so I can get to the API functions I want through the layers of vtables. Some of this works and some of it doesn't. You would access the methods like shown below.

impl ProgramInterface {
    pub fn parse(&self, filename: &str, file_contents: &str) -> String {
        let filename = CString::new(filename).unwrap();
        let contents = CString::new(file_contents).unwrap();

        unsafe {
            let count = ((*self.object_vtable).add_ref)(self as *const ProgramInterface as *const Object);
            println!("Ref count is {}", count);

            let count = ((*self.object_vtable).add_ref)(self as *const ProgramInterface as *const Object);
            println!("Ref count is {}", count);

Unfortunately the rules about how vtables are implemented is more complicated than I remembered. As is written here, for example, the compiler may sometimes insert padding between an objects vtable pointer and other data. Or it might not. It might start with an offset. Or it might not.

This approach becomes a mess rather quickly. I think I could get it to work but I think it's a bad idea, especially since this depends on the C++ ABI, which isn't stable or guaranteed to be consistent across compilers or versions. And it would be a pain to implement in a language not naturally suited to low-level memory operations like Rust.

0xchase commented 7 months ago

I'm going to think it over and give it another go before reopening this issue, though.

0xchase commented 7 months ago

Solved. Got confused about how COM worked, again.

0xchase commented 7 months ago

One thing I noticed... if I call createEngine, on an EngineFactoryInterface with an invalid argument, it seems to throw an exception (the rust runtime fails with the message fatal runtime error: Rust cannot catch foreign exceptions). It would be better for interop to return an error code since non C++ code won't be able to catch exceptions like this.

julianstorer commented 7 months ago

Ah, if any of our COM methods are throwing, then that's a bug. Thanks for the heads-up, I'll have a look and make sure we trap those..