Closed shmish111 closed 4 years ago
How will the FFI work with multiple codegen backends? What happens when someone decides to target LLVM?
The back end is responsible for looking for the relevant file, so let's say you want to release a library with some ffi calls and you want it to be available for multiple back ends, you can create one file for each back end, the back end the user decides to use will look for file types it knows how to deal with. If a back end can deal with multiple types, for example chez and c then it could prioritize one over the other. First it looks for a declared ffi call name in the chez file and if it cannot find it then it looks in the c file. You can always have the compiler warn on conflicts etc.
I had a quick look at the Purescript FFI after you mentioned it the other day, and there's certainly things in there that might be useful to us. I don't think we should just copy it directly though, especially given that there are more things we can express in the Idris type system. Here's some unsorted but related thoughts:
Idris 1 aimed to make it easy to retarget to multiple languages, and I think we need to take this even more seriously in Idris 2. So functions which need an FFI need to take into account that they might be run on some subset of all the backends. Maybe they'd compile fine on some, but not all. Also, a backend might support calls to all sorts of languages, not just the backend language itself. Even now, the Chez backend can make foreign calls both to Scheme and C.
At the minute, we have two primitives for foreign calls:
%extern prim__cCall : (ret : Type) -> String -> (1 args : FArgList) ->
(1 x : %World) -> IORes ret
%extern prim__schemeCall : (ret : Type) -> String -> (1 args : FArgList) ->
(1 x : %World) -> IORes ret
...and corresponding higher level functions:
schemeCall : (ret : Type) -> String -> (1 args : FArgList) -> IO ret
cCall : (ret : Type) -> String -> FArgList -> IO ret
FArgList
is essentially a heterogeneous list which the backend uses to decide how to marshal the arguments to the foreign call. It's handy having dependent types here, because it means the core itself doesn't have to know anything about foreign functions! It's entirely the job of the backend to decide how to translate the arguments to the function calls. I imagine these primitives could be enriched, say with a pointer to the source file or library where the function is defined - this is currently done via %cg
directives, which are up to specific back ends to interpret.
Perhaps there could be a primitive foreignCall
which takes the approach you describe in looking for the relevant FFI glue code.
Since there already is an FFI (even if it is a fairly basic one) there'd have to be a pretty compelling reason to do something else. At the moment, I don't see any huge benefits of the Purescript approach over what we already have.
One final thing to remember: because we take the "multiple back ends" possibility seriously, a library doesn't necessarily know in advance which FFIs are going to be available. So functions which call into foreign libraries maybe need to be "open", in that they may be extended by different implementations in future for different backends. I have a half-baked idea about how to do this, which I should do something about and see if it works.
The reason I thought the Purescript style might fit nicely is precisely because a library doesn't really need to know which backend is going to be used although it does need to provide foreign files for any backend it supports.
I don't really want anything about the backend in my idr
code, so I don't want to describe how to find foreign code (%link
%include
etc) nor really what back ends I support, as you said above, it would be nice if a library using FFI calls was 'open' somehow and Idris code just said "make a foreign call to the backend". The author of library X would want to supply backend code that he knows would be used but it would be nice if someone else could write say java support for library X, as a library itself.
The question then is how do you inform the idris compiler which files to look in for foreign calls? Am I understanding all this correctly?
It's also important to remember when trying to solve this problem that I want to be able to tell idris through CLI options or ipkg file the location of things it needs to link to. I had previously tried to build https://github.com/idris-hackers/idris-array with bazel and it's tricky because the correct way to do it is probably to build the c part (Array.c) in a different task and then tell idris where to look for an .so
file. It looked to me as if there wasn't really a way to do this with idris 1, instead you have to build the correctly named object file in the correct location, convenient when using a supplied makefile but very tricky if you can't use make.
So I think I'm going back on myself a bit here, maybe what you could do is have a %foreignFileName
directive or something that describes the file base name that idris should look for to find the foreign names, in the example above %foreignFileName "Array"
and then a C backend could look for Array.so
somewhere and I could tell the idris command line which locations on my disk to search for that file.
So after writing all this, I'm starting to feel that the purescript idea of having fixed filenames might not be the best but requiring that a foreign file must have a particular base name might be good?
What is the reason for making it possible for a backend to call out to multiple different languages? If I want to call out to some c library and I'm using the chez backend, why not just use the chez FFI?
First, should I move this conversation to a new wiki page @edwinb ?
Just thinking about my previous comment, being able to call out to multiple different languages and having that specified in the code puts a lot more work on the shoulders of backend authors.
If the frontend only has one function, foreignCall
and then I (as an Idris user) want to call out to some C library but I'm using the Chez backend then the responsibility is on me to write Chez that calls out to C. Perhaps there is a java function that does what I want so with the JVM backend I call a java library and with the Chez backend I use the Chez FFI to call a C library.
The alternative seems to be that I write cCall
and I am forced to call the C library when using the JVM backend even though there might be a perfectly good Java library. In addition, the person who wrote the JVM backend has to write a way to pass through cCall
to the JVM FFI.
I don't really feel that it's a big deal for an Idris FFI user to have to write some FFI code in Chez to reach a C library. What are some other downsides?
A target might well have several calling conventions or other metadata that needs to specified to handle the call right, (C calling conventions, A wasm backend doing a wasm call or calling an imported JS function, C# using .NET methods or p/Invoke). It could be handled by decorating the string passed to foreignCall
in a suitable manner, but having several different kinds of foreign functions is common.
@melted I don't quite understand, can you give an example? Would Idris code need to know about these differences, couldn't it be handled in the foreign code?
Oh, I thought it was about Edwin's scheme, not Purescript's. I don't like the Purescript one, it requires lots of boilerplate, and not even boilerplate in Idris. Ideally, I would like to do without boilerplate at all. My dream FFI would allow me to write something like
stdio : C.Library
stdio = C.include "stdio.h"
hello : IO ()
hello = stdio.puts "Hello, world!"
This is obviously a fantasy and not legal even with type providers and stuff, not to mention all the hard work the machinery would take, but a man can dream.
I think it's a good way to look at this, to say "what would my dream case be"? My dream case though is to have no mention of the backend in my Idris code. Indeed I feel that I should be able to write Idris code without knowing what backend will be used and it should be enough to say that the backend must provide a function with this name and of this type. In my case I think the dream part is also the types but we can get very close with a purescript style ffi.
Just to go further, I would not want any mention of the backend in my Idris code which as well as the C.Library
includes the mention of the FFI file that the compiler needs to link to, in your example stdio.h
. For me this should be provided to the compiler by means of either configuration or convention but it should not be mentioned in Idris code.
What do you think?
That's something different from the FFI though. The point of an FFI is to write libraries that use existing libraries on the target platform. And since that platform won't use dependant types (at least not for a long time), you will have to know a lot of details about those target language libraries to write an Idris library for them.
What you're describing is simply an Idris interface, where someone else has written an implementation suitable for whatever your target is, which will probably have at least some components at a low level using libraries in the target language. But long term what we should be doing is having as much as possible written in Idris (or very long term some sort of dependantly typed intermediate language), and have any foreign calls as few and low level as possible.
On Sun., Jul. 21, 2019, 19:36 David Smith, notifications@github.com wrote:
I think it's a good way to look at this, to say "what would my dream case be"? My dream case though is to have no mention of the backend in my Idris code. Indeed I feel that I should be able to write Idris code without knowing what backend will be used and it should be enough to say that the backend must provide a function with this name and of this type. In my case I think the dream part is also the types but we can get very close with a purescript style ffi.
Just to go further, I would not want any mention of the backend in my Idris code which as well as the C.Library includes the mention of the FFI file that the compiler needs to link to, in your example stdio.h. For me this should be provided to the compiler by means of either configuration or convention but it should not be mentioned in Idris code.
What do you think?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/edwinb/Idris2/issues/19?email_source=notifications&email_token=AA2CKBCARM7DSOTCNEHK7JDQATXGLA5CNFSM4IEFEMY2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2ON6YY#issuecomment-513597283, or mute the thread https://github.com/notifications/unsubscribe-auth/AA2CKBBNF5IFZHCK2RSG3F3QATXGLANCNFSM4IEFEMYQ .
Agreed, IMHO an FFI should be mostly used as a stop gap until Idris has every library it needs written in Idris. There will always be legitimate use cases for an FFI but they should be few and far between. I feel that a little extra boiler plate in the backend language is worth avoiding adding complexity to the backend compiler and the build system.
The problem with this type of opinion though is that I've only used languages and FFIs in certain ways so I would be interested to see more detailed counter arguments to my opinion.
If I want to call C from Idris, and I'm using the chez backend, requiring the FFI call to be to Chez means that there is an extra level of boilerplate, and thus an extra level of unchecked things that can go wrong. This is a shame, when it's perfectly possible to call C directly (and we can already do that, albeit in a limited way).
I see there are some benefits to the Purescript approach. But I also think the boilerplate is a drawback, and it doesn't necessarily make sense to take an FFI from another language unchanged since there may be ways we can use the Idris type system to our advantage.
In the end, whatever we decide to do, someone has to implement it. There's plenty of trade offs, and I think the person who gets to decide which way to resolve those trade offs is the person who does the job. (It won't necessarily be me who does it, especially if someone else gets there first :))
Let's say you have a package "A" that uses the FFI to do some generic networking stuff. You have package "B" that is an http client that depends on "A" . You then have package "C" that is a slack client that depends on "B". You have an application that depends on "C". Let's say you want to run your application on node.js and someone else wants to run it on the JVM. How can you make "A" run on both platforms?
One way I can think of is to have a generic callForeign
function. When you compile the library to ttc/ttm you don't need to think about the FFI. When you come to link this code you just need to provide the location of a JavaScript or Java .class file to the compiler. Package "A" could distribute the JavaScript and the .java files and your build system can compile the Java and then pass the location to the linker. Additionally this system is then "open". If library "A" didn't distribute the JavaScript file you could write it yourself. You could even write alternative files for testing.
The down side is what @edwinb said, you've given up on Idris. Personally I don't see this as much of a problem since when I used the purescript FFI recently, I was able to pull most of the JavaScript library out into purescript and just write extremely simple JavaScript code that passed through to quite primitive functions. However I'm sure there are situations where this fall's short.
If there is another way to do this then great, I haven't thought of one yet though.
Three reason I opened this ticket in the first place is because I had had trouble getting the Idris array library building with Bazel. I still don't know how to get it to work since Idris 1 uses make files and the link directive. Additionally it made me realise that it would be even more difficult with multiple backends. I want to avoid setting in stone these problems with Idris 2 if possible. As @edwinb pointed out, the people who write and merge the code really get to choose and that likely won't be me since I am planning to spend my time creating a package distribution ecosystem. I just want to define some scenarios that are currently difficult with Idris 1 in the hope that they can be improved upon in Idris 2. I find the pluggable backend such an exciting feature and a real USP for Idris, I want it to be as good as it can be.
Agreed, IMHO an FFI should be mostly used as a stop gap until Idris has every library it needs written in Idris.
A language that can interoperate with others around it is always better than one that expects you to port everything to it to be productive. Better interop also helps solve the adoption problem.
There might also be modules where the reimplementation effort wouldn't add anything meaningful, for example modules written in other safe languages. In those cases it might be better if the type info could cross the FFI boundary, than if the whole thing was replicated in Idris.
What other safe languages would those be? I don't think Idris' FFI is really being designed around say F*.
On Mon., Jul. 29, 2019, 21:30 rain, notifications@github.com wrote:
Agreed, IMHO an FFI should be mostly used as a stop gap until Idris has every library it needs written in Idris.
A language that can interoperate with others around it is always better than one that expects you to port everything to it to be productive. Better interop also helps solve the adoption problem.
There might also be modules where the reimplementation effort wouldn't add anything meaningful, for example modules written in other safe languages. In those cases it might be better if the type info could cross the FFI boundary, than if the whole thing was replicated in Idris.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/edwinb/Idris2/issues/19?email_source=notifications&email_token=AA2CKBATL5F7WDYCYSM2KQDQB6KRTA5CNFSM4IEFEMY2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3CO67Y#issuecomment-516222847, or mute the thread https://github.com/notifications/unsubscribe-auth/AA2CKBCP2T4N5DNYFGQXA6TQB6KRTANCNFSM4IEFEMYQ .
That's why I said "mostly" because there will always be valid use cases, however I think my point was that there shouldn't be many and they are quite niche.
I would also say that a language that operates more easily with other languages is not "better", merely more powerful or more expressive in a certain aspect. It's all about perspective, take elm for example, this is IMO the "best" language out there from many perspectives such as ease of use, beginner adoption, ability to show JavaScript developers the world isn't all chaos, quality of libraries, project stability etc etc, it is not the best language when it comes to interop but that's just one aspect. Particularly, "worse" interop was built in to elm specifically to help adoption and it seems to be working now.
I would also say that I've found it very frustrating in Haskell when libraries are just wrappers around C libraries, it's great as a quick fix but the API is never as good as it would be if it were native.
There are of course exceptions, perhaps cryptography? I just think how powerful the ffi is should not take priority over how portable and easy to build it is.
Incidentally I think if https://github.com/edwinb/Idris2/pull/45 can be portable then the FFI is awesome. What do you think @raingloom?
@shmish111 I'm not sure tbh, but it doesn't seem bad. I only interjected because I'd like the FFI to not be treated as merely a necessary evil. I'm not qualified to comment on the particulars of the implementation. I'd have to try it to comment. (which I hope to do so soon, now that a friend lent me a 16 gig VM. my 4 gig laptop is Not up to the task of building Idris 2 :cry: )
I've refreshed the FFI, and documented it here: https://idris2.readthedocs.io/en/latest/ffi/ffi.html
There's lots of requirements and constraints, but I think it's a good place to start that makes it possible to work with multiple languages, and gives back ends plenty of flexibility on how to work (indeed, that can drop to the purescript approach as a fallback if they want). So I'm going to close this now.
I still haven't updated the libraries to use the new approach (and in the case of the network library, it's a big-ish task) but I'll get around to it eventually unless someone beats me to it.
Oh wow, I've been away from Idris for a while but the new FFI looks really nice. There is one feature that I'm not sure is there or not, (sorry to harp on about PS but..) in purescript you can return an opaque value and assign it some type:
foreign import data MyType :: Type
Now I can pass some data of type MyType
around in PS land but I can't do anything with it other than pass it back to the FFI somewhere. This is very useful. Can this be done with the Idris FFI? Using Ptr t
perhaps?
That's mostly what Ptr t
is for, yes. The t
is just a phantom parameter that keeps types distinct.
I've recently been using the purescript FFI in anger and I have found it to be very nice to use. One of the main features of this style of FFI is that your code does not need to indicate the type of the underlying FFI. The examples given in purescript are javascript files but there is a C backend for purescript and you could write FFI code that would work in both with no changes, you only need to swap that the underlying .js and .c files that contain the appropriate function definitions.
I also think this could be developed in 2 stages, initially an FFI for chez could be implemented, this would likely be very simple to do since the RTS is chez and also because chez is dynamically typed. A second stage would be to implement a C FFI, this would require passing things through the chez ffi and will be more tricky to work out how to build and link and how best to deal with types, memory management etc. Chez has the tools for all this though so it will probably end up being more about how to deal with it in chez rather than how do deal with it in idris.
What do you think?