Cuda backend - Githubissues

My mind has been on Futhark lately so I thought it would be time to open this issue in order to track the state of the eventual Cuda backend.

I've long been thinking about making a backend for Futhark for my own language, but I've decided that I am going to wait until Futhark supports Cuda. The reason for that is that I want Spiral to be deep learning focused. I've read that the general GPU market is 75% NVidia and 25% the rest, but in terms of deep learning market share it is more like 100% NVidia and 0% the rest. All the best libraries are on the Cuda side, and since much like Cuda, OpenCL is a product of large multinationals, I see no reason to support them free of charge against my best interest.

Supporting only Cuda in my own language will make it easier to focus my development effort and make a product that excels at one thing rather then be average at two.

In the following post I will go into pros and cons of doing a Futhark backend for my own language into detail. I intend to have it be feedback for the authors, and for myself it will be a way of crystalizing what I want my commitment to Futhark to be.

If you wish for CUDA support (which would be easy to add) for the libraries, you would probably also want Futhark to grow an FFI (which is a little trickier).

Pros:

1) Futhark has great ergonomics. Having such array capabilities as it does would definitely be a benefit to my own language. It would also free me up from having to write my own kernels and maybe cover 90% of my GPU programming needs in terms of that aspect. This is the main reason why I find the language attractive.

2) The actual backend would not be hard to do at all. Discounting the unexpected, it would take me less than a week to do it. If I allot the time for dealing with the inevitable bugs that would arise in the interop process, I'd have to say ~2 weeks and I consider this a pessimistic estimate.

Neutrals:

1) This is just a personal decision and not the criticism towards Futhark of having chosen OpenCL over Cuda as a starting point, but I do not want to go through 2 weeks of effort only to have to shelve the backend afterwards for an indeterminate amount of time.

Also, Spiral is not ready for this kind of thing anyway. I think now that I finally got it into a decent shape, I want to spend the next 6 months building all sorts of useful machine learning functionality into it anyway. The benefit of this is also that it will give Futhark time to develop the Cuda backend and the FFI.

2) Originally I wanted Futhark to have an F# backend, but now that I have my own language, I do not need it anymore. A C API should be fine.

Cons:

1) Futhark is a high level language with no low level story to speak of. The fact that it is a high level GPU language is what drew me to it, but also what fuels my distrust towards its general design. To put it frankly, in Futhark, the programmer is much at the mercy of the optimizer and that will inevitably lead to problems.

This is a big deal to me as in my own language, I am aiming to have it excel at deep learning and I know I am going to have to take advantage of various Cuda libraries to achieve the desired result.

When I tally up the pros and cons, in the end I always conclude that I'd rather have those optimized Cuda libraries rather than Futhark. Hence it is important that Futhark work with those libraries rather than compete against them.

2) Imagine a scenario where I depend on Futhark and have a lot of code bound to it and the authors decide to move on. I am aware that research languages like Futhark have a lifespan. Most languages life for a about a year before dying off. Others tend to be works on for years before their author gets a PhD and does something else. Only a small minority of them lives on and comes into widespread use.

It used to be the case that unmarried men were seen as less trustworthy and stable so their relatives would find a way to get them married.

I get the sense that Futhark is really looking widely for opportunity. I already knew from @Athas 's posts that he has large plans for Futhark. Mobile? Ok. Cuda? Ok. .NET and Java backends? Ok. FFPGAs? Maybe if he has the time...

It being a general purpose language? Now that was a surprise for me. I am really wondering when @Athas will find the time to do all of that? Being overwhelmed with work is a great way to not get anything done. I mean Spiral is only 1/8th the size of Futhark and it still took me a year to make and I am barely done despite it being a full time job for me. One might have great desire now, but change his mind with the passage of time.

I think Futhark would be served with by just having the Cuda backend and the FFI for starters. I am amending my original prerequisite for doing the Futhark backend for Spiral to it also having the FFI. I am also going to need it not just to interop with Cuda libraries, but to also do interop with my own language's Cuda backend.

Also I once suggested inline C, but if it had the FFI, I could compile my Spiral's Cuda kernels separately so it would not be a problem.

When I first found out about Futhark over a year ago, in my own ML library I was using ugly wrappers around raw C code and the little generic capabilities I had were through passing text macros. It was extremely unsafe, inelegant and effort intensive to have those things lying around and I wanted to abstract them away at all cost. My interest in Futhark was born from that desire.

Today that cost has mostly been paid and I have an extremely expressive and efficient language with excellent interop capabilities for both Cuda and .NET. I think its capabilities are such that it could integrate well with any language, including Futhark.

I am going to need to go to a certain length before I can decide whether it would be worth doing the Cuda stuff in my own language vs offloading it to Futhark. I am committed to doing the experiment and doing by best job once the time comes to do so, but after it is done I make no promises to supporting Futhark and will consider the obligations I set (mostly to myself) discharged. Neither will I say that I am likely to do so as Spiral is a significant competitor to any language in any domain.

Well, that is about what I wanted to say regarding how I saw my obligations.

Sorry if the contributions you'd prefer would be more like doing work on the language by contributing actual code, but I tried to rev myself up to do that once and it did not work out. In fact way back, I once tried to make a Cuda backend for the Diffsharp (automatic differentiation) library and much like the time I tried doing a parser for Futhark, I generated a lot of heat and not much useful work. From those experiences I've learned that when I tell myself that I am doing things for others, I tend to slip into thinking in shoulds and lose sight of my goals a bit too easily. I want to learn from that and make individual steps that are more grounded.

So let me talk a bit about what Futhark would gain from the experiment being a success and resulting in symbiosis.

1) Access to the .NET ecosystem via Spiral.

2) When programmed from Spiral, Futhark would have access to its first class staging and intensional polymorphism capabilities. What this means is that Futhark would not need to include abstractions such as higher order functions directly, as Spiral would provide them for it. There is a disadvantage to this in that Spiral pushes the responsibility for dealing with termination onto the user, but when used in combination with Futhark that should not be much of a problem.

The main abstractions Spiral offers are essentially costless. Via staging, monadic workflow would be possible directly on the GPU. While something like that would be mindblowing, it would not be that useful as GPUs aren't suitable to take advantage of that yet, but with every generation they seem to be picking up more general purpose capabilities.

Also since Spiral has a great low level story it would mean that performance oriented users would not need to move back to C when they start encountering performance difficulties which will be inevitable given Futhark's high level design.

3) The libraries that are specific to Spiral which such as its deep learning library. Spiral itself does not have any capabilities specific to AD, but the way it handles function would be extremely suitable for that sort of thing. I read the paper on using Polyvariant Union-Free Flow Analysis where Siskind and Pearlmutter made an language with inbuilt AD capabilities and got one to two orders of magnitude speedups on AD related tasks. The way Spiral attains its efficiencies is similar to that language, but in Spiral I went a great deal further and gave it F#'s syntax, pattern matching, first class modules, tuples, union types (so it is not Union-Free) and on the host side it also has recursive datatypes and unstaged functions.

As a language designer, I place a great emphasis on integration and want to push Spiral along those lines.

The true allure of integration is not just what can be done now, but what will be possible to do tomorrow. Huge deep learning frameworks like Tensorflow and CNTK are languages in themselves and their very nature makes them static, brittle and unsuitable for things like reinforcement learning where the graphs are dynamic. Also, while being high level, Tensorflow in particular is not fast. The defunct Theano which was the first big framework was known for its excessively long compilation times. The new hotness which is PyTorch actually uses C under the hood and C-preprocessor macros to make its tensors generic in types. This is the level of Facebook corporate programming circa late 2017, where people are still wedded to 1970s technology and thought patterns. Right now they are working on a tracing JIT for the framework and still haven't gotten it to handle control flow. My sense is that they are going in a completely the wrong direction.

Most definitely those big frameworks will all get replaced when something better comes along. And if that does not happen then they will be making a mistake.

I have already started work on the ML library for Spiral and will hopefully have a sneak peek done by the end of the year after which I will move to doing language tutorials for Spiral which will allow you to judge what the language is about for yourself.

Sorry about the essay length posts, but I have an idea I want to run by you. Please tell me if it all sounds appealing to you.

I've been doing some more thinking and I think I understand the essential complication that bother me about the design of Futhark - that its code will essentially be written by humans. This presupposes a few things like the need for module, libraries, FFI, generics and such. As pure array language Futhark is essentially at odds at such design because to do anything with the real world it might need to stop being pure and might need to stop an array language but a general purpose one.

Trying to go in such direction while trying to keep the purity and array orientation in the language will probably make your life hell.

My suggestion is thus - Futhark at its core does not really need the above mentioned things. It does not need the FFI, modules, typeclasses, type inference, libraries, pattern matching, destructuring like a general purpose language would.

What you could do is at some point split it into two - keep a simply typed core that the machines could compile to, and since you have such sympathies develop the outer shell into a fully fledged language that would use the core as a DSL. This is in fact how I would prefer to use Futhark (I mean the core) and would have the benefit of clearing up your vision regarding where you want the language to go because it is obvious to me that you want it to go a lot further, but a pure array language can only go so far.

Regarding the FFI, it would actually be beneficial if you did not do it - .NET already has a wrapper library for Cuda, it is pretty huge and would be a significant expenditure of effort to duplicate that work in Futhark. It is not just that the FFI will be tricky to make, trying to port the entirety of them for Cuda libraries would need to be automated and take up a significant chunk of time.

What having two languages is would allow you is also tune the interop. Right now since Futhark is an isolated island, you've been shielded from having to deal with that aspect of language design, but if you have to deal with it you would have the incentive to optimize it.

Going down this route would also allow you to ditch the Python, .NET and Java backends, leave only a single point of entry to the language and allow you to focus your effort on just that. Spiral for example, while being a .NET language would not actually derive any benefit as far as I can tell from there being a .NET backend since it would compile to it directly.

I noticed you on the PL sub you posted about using Futhark as an library, but I honestly have to apologize as I have not tried it. I probably should have as I would have some concrete feedback on passing state between languages, but I've no interest in trying it out on the OpenCL side.

Let me just change again my request to you just making the Cuda backend. Since you mentioned it was easy then you should probably do it as soon as possible.

If you take something like half a year to do it or maybe even earlier than that, I will have probably built up the deep learning library for Spiral in its entirety by then and won't have any need to go to Futhark for its GPU services which will kill my motivation to experiment with it.

For that sake I will also change my mind - instead of working for the library for a long while first like I wrote in the previous post, I will drop whether I am doing and drum out the Futhark backend for Spiral right there. I think this would be ideal - rather than push work in to the future where it could be swept under the carpet, it would be best to strike while the iron is hot.

I thank you for your understanding and apologize for these long stream of consciousness rants.

I'd like you to give me an ETA for when you would like to do the Cuda backend so I know what to expect please.

I think I need to make my own motivations for writing Futhark clear: I want a nice language in which I can do data-parallel programming, and I think Futhark is shaping up to be just that language. In many cases I find that the implementation is cleaner and simpler than in Haskell, Python, or any of the other general-purpose languages. That the runtime performance is good is of course also important. So far, Futhark is doing an increasingly good job at supporting modules, generics/polymorphism, and libraries, and we haven't run into dire problems yet. I don't just need it as a target for code generation from other languages (although we're using it for that, too).

I should note that when I call Futhark "general-purpose", I mean something else than when I call Haskell "general-purpose". Futhark is not ever supposed to be an application programming language, but it is not tied to GPUs specifically - that's the "general-purpose" part. It will remain a small and pure language for writing computational kernels, with a compiler that can generate high-performance parallel code.

Futhark does have a core language, which is simpler than the source language (no polymorphism, modules, etc), that I have considered granting an external syntax, but I'm not convinced it's much better as a code generation target. If you just target a subset of source Futhark, without generating module or anything like that, it's more or less the same thing. Notably, the core language is still pure, and does not have the ability to call foreign code.

With respect to the FFI, I must make it clear that there are two different things that are both called "FFI":

Calling Futhark programs from other languages. This is currently supported for both Python and C (and since most language can call C, this is fairly universal). Futhark was always designed with this use case in mind, and it works well in practice, although with some rough edges. We alrady have many examples of programs where the computational core is in Futhark, and another language is used for user interaction.
Calling other languages from Futhark. This one is more tricky, but also less immediately useful. It might be possible to add an FFI that can let you call some optimised matrix multiplication function, but what happens if you perform an FFI call inside a parallel region? You can't do arbitrary function calls inside of GPU kernels. Adding this is possible, and will happen eventually, but requires a lot of fallback logic and deep thinking.

Regarding the CUDA backend, I have no immediate plans to write it myself. I estimate it will take no more than a month or two of work, since it could be designed alongside the lines of the existing OpenCL backend, but these are months I could spend on things that I derive more benefit from. If I ever need to interoperate with someone (such as an industrial partner) that absolutely only uses CUDA, then I'll add it, but otherwise my incentives are lacking. I'll continue proposing it as a project to the students at the department, so there's always a chance one of them might find it interesting and pick it up. Given that the next round of projects is due to start in January, and they usually take five months to complete, you're looking at more than half a year, and that's if a student find that project more interesting than the alternatives.

-- \ Troels /\ Henriksen

Very well, and thank you for your reply. It is actually good for me that it is this far out, as it makes my decision what direction to take the language and the ML library in a lot easier. Today I mostly spent the day thinking of what I wanted to do with Futhark. If you said something like 2-3 weeks then I would have waited a bit before committing myself to writing the library Cuda kernels in Spiral, but something like two months would have me rewriting stuff in the middle.

With six month...well, I probably won't be using Futhark in the ML library ever, but maybe for some different project I might consider it. That having said, the two languages will probably end up being competitors if you ever come to the Cuda side. I look forward to seeing what Futhark will become.

I suggest closing this issue and reopening it under a different thread since I cluttered it up so much with my comments.

diku-dk / futhark

Cuda backend #434