NSoiffer / MathCAT

MathCAT: Math Capable Assistive Technology for generating speech, braille, and navigation.
MIT License
61 stars 35 forks source link

Consider using Diplomat #193

Open sffc opened 1 year ago

sffc commented 1 year ago

Hi there,

I'm from the team working on ICU4X. Building a Rust library with cross-language bindings was important to us, so we built a rool called Diplomat to automatically generate bindings and documentation and keep them up to date in all supported languages. You can take a look here:

https://github.com/rust-diplomat/diplomat

CC @Manishearth

Manishearth commented 1 year ago

I'm also happy to help folks understand the tool better to use in integration; the book does document a bunch of stuff but not everything

NSoiffer commented 1 year ago

Thanks for pointing out your code. It does look potentially helpful. The link mentioned to "the book" is not right, so please fix that.

I briefly looked at the documentation. I didn't see any mention of handling "Result"/errors. Is that handled? Other than that, MathCAT's interface tries to use very simple types. But it does heavily rely on returning strings which raises memory handling issues for C users that aren't an issue for Python and JS users since they get hidden behind the current interfaces. Is that an issue for Diplomat? My C interface therefore is a bit different than my Python interface

I've been pretty happy with the pyo3 Python bindings library. Would diplomat be able to create something that is as simple to use?

FYI: the interface MathCAT wants to expose is in src/interface.rs. As hinted at above, it is exposed in slightly different ways in each of the current interfaces in other projects.

Manishearth commented 1 year ago

link fixed.

We do support result types, they turn into a union in C, a std::variant-wrapper in C++, and an exception in JS

we have a way to return strings that is generic over the requirements of the receiver language

note that we do not have a Python backend but would happily support work to add one if it was desired.

NSoiffer commented 1 year ago

The C, Python, and Java interfaces are currently used by others, so they need to be supported. I also use WASM for a web demo and would like to build the demo on top of a JS interface instead of the current mess being used (web-sys is vast overkill).

Of course, just because I start using Diplomat for one or two languages, doesn't mean it has to work for others (such as Python).

The documentation on Writeable only covers someone passing in a "String", not Rust code returning Strings. That's where there is some ugliness right now dealing with strings in C since users need to call back to Rust to free it up.

It would be really helpful for me to understand how Diplomat would handle Strings/Result if you could show me the code for

pub fn set_mathml(mathml_str: String) -> Result<String> {...}

The current C code is

pub extern "C" fn SetMathML(mathml_str: *const c_char) -> *const c_char {
    return set_string_error(
        set_mathml(safe_string(mathml_str))
    );
}

The current interface returns an empty string to indicate an error and GetError() is called to get the error string. FreeMathCATString() needed to be called to free any string. All pretty ugly. It sounds like Diplomat will simplify this.

With pyo3, there is none of this mess and the calling code acts just as if it were calling a python function.

What does the Rust/Diplomat code look like for set_mathml and what would the calling C code look like?

sffc commented 1 year ago

Returning strings "just works" in backends that support returning allocated strings (i.e. C++ but not C).

https://github.com/rust-diplomat/diplomat/blob/main/example/src/fixed_decimal.rs#L23

        #[diplomat::rust_link(fixed_decimal::FixedDecimal::write_to, FnInStruct)]
        pub fn to_string(&self, to: &mut DiplomatWriteable) -> Result<(), ()> {

generates this C++:

inline diplomat::result<std::string, std::monostate> ICU4XFixedDecimal::to_string() const {
Manishearth commented 1 year ago

The documentation on Writeable only covers someone passing in a "String", not Rust code returning Strings. That's where there is some ugliness right now dealing with strings in C since users need to call back to Rust to free it up.

No, Writeable is about Rust code returning strings, the Writeable type is about passing in strings (that can then be sent to other languages)

It would be really helpful for me to understand how Diplomat would handle Strings/Result if you could show me the code for

In addition to what Shane said, under the hood that function would use a C API that returns a union, and accepts a writeable object that wraps a sink for writing to a string

Manishearth commented 1 year ago

So for your API I would write something like this:

#[diplomat::bridge]
mod ffi {
    use diplomat_runtime::DiplomatWriteable;
    use std::fmt::Write;

    #[diplomat::opaque]
    struct Thingy;

    impl Thingy {
        pub fn set_mathml(input: &str, out: &mut DiplomatWriteable) -> Result<(), ()> {
            out.write_str(input).map_err(|_| ())?; // just turn the input into output. you could do something more complex if you want
            Ok(()) // just return Ok
        }
    }
}

In the c2 backend (diplomat-tool c2 outputfolder/, "C2" is the new C backend, prefer this), this generates:

// diplomat_result_void_void.h
typedef struct diplomat_result_void_void {
  bool is_ok;
} diplomat_result_void_void;

// Thingy.h
// ...
diplomat_result_void_void Thingy_set_mathml(const char* input_data, size_t input_len, DiplomatWriteable* writeable);
// ...

DiplomatWriteable is a type that contains callbacks for writing to arbitrary sinks. A simple way of creating one is here:

https://github.com/rust-diplomat/diplomat/blob/54e85c813508acc1608153ffdff8c55e69a1af3a/example/c2/main.c#L16-L17

In C++ (diplomat-tool cpp2 outputfolder/) you get:

// Thingy.hpp (technically Thingy.d.hpp)
// ...
class Thingy {
public:

  inline static diplomat::result<std::string, std::monostate> set_mathml(std::string_view input);
// ...

so it returns a nice string, and accepts a string view.

We do not currently support free functions (we would like to, but nobody has needed it yet), so the stuff still lives on a type.

NSoiffer commented 1 year ago

Thanks for the example, that helps me understand it a little better. Just to make sure I understand the C++ interface correctly:

  1. The input string becomes a `const char*1 and a length
  2. The result string is passed in as an arg of type DiplomatWriteable*
  3. The return type is diplomat_result_void_void

Presumably for MathCAT, "class Thingy" would be something like "class MathCAT" so that calls look like

char buffer[1000];
DiplomatWriteable result = diplomat_simple_writeable(buffer, 1000);
MathCAT_set_mathml(mathml, strlen(mathml), &result);

with maybe an if wrapped around it to test the return. Is that correct?

I suspect it would be better not to have a fixed size buffer as the input string size is only roughly correlated with the output string size. I didn't see any examples of that in your examples dir. How is that done? If it is really ugly, I'm sure a very large buffer would suffice as no one would want to listen to a math expression that is over 200(?) words long.

Also, how does one get at the error message? The "result" type appears to boil down to be a bool. But errors in Rust have a string/error message associated with them.

At least one of the users needs a C interface -- they are using Delphi as their programming language and make use of the C interface DLL. What does that interface look like and what would be some sample calling code?

Manishearth commented 1 year ago

Just to make sure I understand the C++ interface correctly:

C, not C++. C++ is nicer; it takes a std::string_view and returns a result wrapper around a std::string. I showed it above.

I suspect it would be better not to have a fixed size buffer as the input string size is only roughly correlated with the output string size. I didn't see any examples of that in your examples dir. How is that done?

The API we provide by default is diplomat_buffer_writeable_create(). However, you can hook up your own buffer type, e.g. in C++ we have hooked it up to write to a std::string.

Also, how does one get at the error message? The "result" type appears to boil down to be a bool. But errors in Rust have a string/error message associated with them.

Just use an error type in the Result and give it whatever methods you need.

A real world example is ICU4X's live usage of Diplomat (https://github.com/unicode-org/icu4x/tree/main/ffi/capi), though we do not yet enable the C2 and CPP2 backends[^1].

In ICU4X we return a Result<(), ICU4XError>

where ICU4XError is a giant enum

Over C that turns into diplomat_result_void_ICU4XError which contains a union [^2].

If you end up returning a Result<T, U> you get an enum like diplomat_result_box_ICU4XFixedDecimalFormatter_ICU4XError which has a proper union inside it

At least one of the users needs a C interface -- they are using Delphi as their programming language and make use of the C interface DLL. What does that interface look like and what would be some sample calling code?

You've been looking at the C interface. The C++ interface is nicer since it has generics and stuff, but the stuff you've been asking about here is the C interface. A high quality C interface is not a priority for us but we do try to make it decent and don't mind suggestions on improvement for the C2 backend.

[^1]: they're slightly different API-wise from C/CPP and we don't want to release ICU4X using them until we're ready with all the new features we want [^2]: Yes, that union should have a second empty variant; this is a bug in the C backend; C2 doesn't have this problem.

NSoiffer commented 1 year ago

I received some more feedback and I need both the C and C++ interfaces, plus I'd really like to have a JS interface. I think I'll stick with the python interface I have for the moment.

I'm definitely interested in using Diplomat. I know it is asking a lot, but I suspect you could add the interface in less than 10% of the time it would take me. Is that something you would be willing to do? The functions that need an interface are in src/interface.rs and you can see the current C interface by looking at https://github.com/NSoiffer/MathCATForC/blob/main/src/lib.rs. In summary, the number of functions is not that large:

SetRulesDir
SetMathML
GetVersion
GetSpokenText
SetPreference
GetPreference
GetBraille
DoNavigateKeyPress
DoNavigateCommand
GetNavigationMathMLId
GetNavigationMathML

and one function that returns a struct

pub struct NavigationLocation {
    id: *const c_char,
    offset: i32,
}
pub extern "C" fn GetNavigationLocation() -> NavigationLocation

If doing all of them is too big an ask, how about showing the C/C++/JS interfaces for GetPreference and GetNavigationLocation so I see an example of how they are done for different args/return values?

I created a diplomat branch to work in if you are willing to help me use diplomat.

Thanks for any help you can give.

Manishearth commented 1 year ago

I need both the C and C++ interfaces, plus I'd really like to have a JS interface

Yep, we have all three. The C and C++ ones have been recently redone to be better, the JS one I'm planning on redoing but you can still use the old one.

Is that something you would be willing to do?

This does seem like a relatively small surface area, I can try ... no promises, and this would be low priority for me, and I might just do it partially. I'll keep you posted.

NSoiffer commented 1 year ago

I appreciate whatever you can do. Maybe hold off on the JS version if you are going to redo it. Up to you.

Manishearth commented 1 year ago

It's easy enough to generate, but yeah perhaps worth waiting on advertising it

NSoiffer commented 10 months ago

This does seem like a relatively small surface area, I can try ... no promises, and this would be low priority for me, and I might just do it partially. I'll keep you posted.

@Manishearth: any chance this might happen soon?

Manishearth commented 10 months ago

Pretty busy right now, can't promise anything.

@sffc might be interested in trying?

NSoiffer commented 10 months ago

I have someone who has maybe a week's time, but no Rust experience. I could possibly get him to try it, but I'm worried that he'll spend most of his time trying to understand Rust and would be better off doing a different project. Any idea whether after some initial set up, Rust experience is needed?

Manishearth commented 10 months ago

Diplomat's API definitions are also a subset of Rust, so while I don't think much Rust experience is required, it may be annoying for them to have to pick it up this way.

NSoiffer commented 10 months ago

Thanks. I won't suggest this to him.