Foreign Function Interface

DemiMarie commented 7 years ago

Perhaps one of the few essential parts of allowing cross-implementation SML code to be written is a standardized Foreign Function Interface. This is because an FFI is needed if one wishes to use features that are provided by third-party libraries or the operating system, but are not provided by one's implementation.

A good FFI should (note that these are my personal views):

allow calling almost all C functions.
allow for reading and writing C data structures.
allow for exporting functions that can be called from C.
support for cleanup of foreign resources when SML values become unreachable.
not require users to write C stub code by hand. Generating C automatically during compilation is allowed.
be independent of implementation details, such as the representation of SML values.
not cause unnecessary performance degradation.
support SML code being compiled into a library that is called into by foreign code.

MLton has an FFI that I believe supports all of these. I am not familiar with any others.

JohnReppy commented 7 years ago

I agree that a standard FFI is an important goal. The MLton FFI is a reasonable place to start, but it does have some issues. It makes some significant assumptions about SML runtime representations that might be hard to implement without whole-program compilation. Also, the callback support is too static; it is not possible to have multiple instances of a callback at runtime.

DemiMarie commented 7 years ago

@JohnReppy Having thought further, I agree.

Having thought further, the best choice seems to be Haskell's, adapted to SML syntax. It has some weaknesses (need of external binding generators to allow reasonable handling of C structs, for one), but it is battle-tested and does not require whole-program compilation or monomorphization. It does not require C stubs either. Haskell's FFI uses type classes (mostly Storable, as I understand it), so this might require the inclusion of #18 (modular type classes).

eduardoleon commented 4 years ago

More important than a foreign function interface, I think, are standardized primitives for unsafe programming directly in ML, without having to reach for C code. One lesson we should learn from Rust is that modularity is useful for wrapping unsafe implementations of safe abstractions. Standard ML already has a more sophisticated module system than Rust's, but it lacks the unsafe primitives necessary to write pure ML implementations of

Dynamically sized vectors and double-ended queues, backed by an array of potentially uninitialized elements. (This is not the same thing as an array of options, all initialized to NONE.)
Hash tables, without the overhead of bounds-checked array indexing. (The bounds check is superfluous, because it is easy to prove that the implementation will never use invalid array indices.)

RobertHarper commented 4 years ago

Hello,

Thanks for the suggestion; I would bet it would be controversial. My personal opinion is that it is a good thing that SML is safe, at least as a language design. It's infamous that "I know what I'm doing" is rarely a maintainable state of affairs. The person who wrote the original code might well, but that doesn't scale over space or time (multiple developers, or developers, even the original, over time).

But I appreciate the Rust lesson, and I appreciate that reasonable people can disagree about these matters. In particular having an FFI is already problematic but essential, and one could argue that therefore why bother being safe? That's an old controversy that appears not to be easily resolvable.

Best, Bob

On Nov 3, 2019, at 17:35, eduardoleon notifications@github.com wrote:

More important than a foreign function interface, I think, are standardized primitives for unsafe programming directly in ML, without having to reach for C code. One lesson we should learn from Rust is that modularity is useful for wrapping unsafe implementations of safe abstractions. Standard ML already has a more sophisticated module system than Rust's, but it lacks the unsafe primitives necessary to write pure ML implementations of

Dynamically sized vectors and double-ended queues, backed by an array of potentially uninitialized elements. (This is not the same thing as an array of options, all initialized to NONE.) Hash tables, without the overhead of bounds-checked array indexing. (The bounds check is superfluous, because it is easy to prove that the implementation will never use invalid array indices.) — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

ratmice commented 4 years ago

To me at least, before entertaining unsafe, i'd want to be able to reason that some chunk of my program is limited to the safe subset of the language. This isn't something the rust mechanism gives easily.

in rust you basically have to bypass the build system and run the compiler directly since it defaults to allowing unsafe for all transitive dependencies, running the compiler directly you can flip the allow to forbid unsafe.

While in basis there are optional modules, i'm not recalling any optional primitives in the language itself, and of the optional modules in basis, I'm not recalling anyone that implements these based on a compile time flag or anything.

If there is a mechanism in place to restrict the language to the safe subset (probably by default), which allowed mixing of safe and unsafe compilation units, i wouldn't really see a problem with it. Given that the language doesn't really specify any build mechanisms besides use, how would it work? use_unsafe which raises a compiler error if in safe_only?

This all most likely belongs in it's own issue rather than the FFI one I would think...

eduardoleon commented 4 years ago

I see an unsafe subset of ML as a net improvement over calling C, for the following reasons:

On the usability front, an unsafe subset of ML would still have parametric polymorphism, algebraic data types and pattern matching, modules and abstract types, etc. Passing complicated values to an unsafe ML function would require no marshalling. You could still write and test your programs in a REPL.

On the verification front, an unsafe subset of ML would still have a formal semantics. The interaction between safe ML and unsafe ML code would be easier to understand than the interaction between ML and C code. For obvious reasons, there would be no theorem saying that no well typed term evaluates to wrong. But you could still prove yourself that the term you have written does in fact never evaluate to wrong.

On the cultural front, reasonable programmers would only use unsafe features sparingly and in small modules. Given how expressive ML is, these modules could be under 150-200 lines of code, and perhaps a lot less.

The serious use case I envision for an unsafe subset of ML is implementing numerical methods libraries. Presently, when I want to do numerical linear algebra, I have to reach for Python and R. I really wish I could reach for ML instead.

@ratmice raises an important issue. Unsafe features need some gatekeeping, so that the path of least resistance is to use safe features only.

DemiMarie commented 4 years ago

I agree that a standard FFI is an important goal. The MLton FFI is a reasonable place to start, but it does have some issues. It makes some significant assumptions about SML runtime representations that might be hard to implement without whole-program compilation.

True, but those are also the representations you need if you want good performance. Otherwise, you are easily losing an order of magnitude in performance, and I really really do not want people avoiding polymorphic code on performance grounds.

Also, the callback support is too static; it is not possible to have multiple instances of a callback at runtime.

That is C’s fault, and is why virtually all C libraries support passing a user-provided void* parameter to any callbacks. Generating C callbacks at runtime requires dynamic code generation.

JohnReppy commented 4 years ago

The type/whole-program issue can be solved if you are willing to restrict array/vector arguments in the FFI to the MONO_ARRAY/MONO_VECTOR modules (e.g., CharVector), since these types can have a packed representation even with separate compilation. It may be better, however, to push the allocation of array data to the C side, since that avoids GC dangers.

You can implement dynamic callbacks with very minimal runtime code generation. All that you need is a template that you can copy and patch to dynamically generate a C function for a given SML closure.

DemiMarie commented 4 years ago

Some platforms, such as iOS and consoles, prohibit all dynamic code generation. libffi has an ugly hack for iOS, but I don’t think we should require it. Much better to make callbacks static and fix the broken C libraries.

I agree that allocating data from the C side is a better choice. In particular, passing data on the SML heap to C is unsafe in the presence of parallelism on the SML side.

DemiMarie commented 4 years ago

When it comes to representation, Rust’s solution is to only separately compile monomorphic functions. Polymorphic functions are compiled when used. Since monomorphic code is quite common, and since parsing and type-checking are still done per-crate (Rust’s compilation unit), this is practical.

JohnReppy commented 4 years ago

Some SML compilers take that approach for functors; i.e., compile them when they are applied. I'm not sure how well it would for core-language polymorphism; I suspect that it would reduce to having to do whole-program monomorphization.

RobertHarper commented 4 years ago

Well, according to The Redefinition that Karl and I did, polymorphic functions are functors, which is especially natural when considering type classes such as equality types. So without special provision the two would be treated the same, which is what MLton does.

On Feb 29, 2020, at 17:38, John Reppy notifications@github.com wrote:

Some SML compilers take that approach for functors; i.e., compile them when they are applied. I'm not sure how well it would for core-language polymorphism; I suspect that it would reduce to having to do whole-program monomorphization.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

JohnReppy commented 4 years ago

I don't think that MLton treats them the same (I assume that you are talking about the implementation). Functors are eliminated before monomorphization using a specific defunctorization pass.

RobertHarper commented 4 years ago

Ok, but I think the effect is as if they were the same thing.

On Feb 29, 2020, at 18:29, John Reppy notifications@github.com wrote:

I don't think that MLton treats them the same (I assume that you are talking about the implementation). Functors are eliminated before monomorphization using a specific defunctorization pass.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

MatthewFluet commented 4 years ago

Yes, MLton first eliminates all module-level constructs (with code duplication at functor applications and simple renaming to eliminate structures) and then (after some intervening simplifications) eliminates polymorphism.

JohnReppy commented 4 years ago

Responding to Bob: I'm not sure what you mean by "effect" here. Functor application is beta-reduced at compile time, which means that the body of the functor is specialized to code, as well as types, whereas the specialization of polymorphism only specializes to types. If I implement the list-map function as a functor, then I know that I will get specialized versions for each application of the functor, whereas the polymorphic list-map function will be specialized to the type of the list elements, but not necessarily to different function arguments that have the same type.

RobertHarper commented 4 years ago

Polymorphism is modelled by a functor taking a type as argument, and no other arguments. Type classes, such as equality types, take a type and the class operation as argument to the associated functor. It would make sense to me to inline the equality test, in those rare cases in which you actually use equality test and actually want the default equality. So it’s the same.

Bob

On Feb 29, 2020, at 21:38, John Reppy notifications@github.com wrote:

Responding to Bob: I'm not sure what you mean by "effect" here. Functor application is beta-reduced at compile time, which means that the body of the functor is specialized to code, as well as types, whereas the specialization of polymorphism only specializes to types. If I implement the list-map function as a functor, then I know that I will get specialized versions for each application of the functor, whereas the polymorphic list-map function will be specialized to the type of the list elements, but not necessarily to different function arguments that have the same type.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SMLFamily/Successor-ML/issues/25?email_source=notifications&email_token=AALWY5IM3ADDQLVG2ZICLOLRFHDDFA5CNFSM4CI5ZQFKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENMRUTQ#issuecomment-593041998, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALWY5JLQQKM7BC3DUCAYCTRFHDDFANCNFSM4CI5ZQFA.

YawarRaza7349 commented 4 months ago

FWIW, for anyone thinking about the "unsafe SML" thing, I think unsafe C# would be a closer analogue than Rust is. This demonstrates how garbage-collected memory could be used unsafely. These are some higher-level unsafe APIs of the sort that could be exposed to SML programmers. It'd also be a good source of "experience reports" of how people who had been programming in an initially safe language might have started using these features, what ways they incorporated it into their codebase, and whether it has worked well for them.

SMLFamily / Successor-ML

Foreign Function Interface #25