fff-rs / coaster-nn

Coaster plugin for backend-agnostic Neural Network operations
http://spearow.github.io/coaster-nn
Other
13 stars 4 forks source link

OpenCL Backend #14

Open ghost opened 7 years ago

ghost commented 7 years ago

Leaf, and now Spearow, always promised OpenCL but has yet to deliver. I think that's a shame!

There are plenty of great folks working on ML experiments for Rust which feature OpenCL or frameworks thereof, including Arrayfire. I hope they'll forgive me for bringing Spearow to their attention:

@jonysy - One of the other fork-and-maintainers of Leaf @botev - Past experiments in Autodiff for Rust/ML @sebcrozet - Too many Rust/Math experiments to count, plus my favourite: rs2cl @tedsta - Wrote a GPU N-array library and is making a Deep Learning toolkit based on that @jramapuram - Using arrayfire-rs to make an ML Framework

There are lots of really clever individual efforts on Rust OpenCL ML, but I feel like a good push in one good framework would establish something useful. The above is kind of my dream-team; I hope flattery overcomes the annoyance of being mass-mentioned in here. :)

Any other suggestions?

botev commented 7 years ago

So from the small experimentation I have done, I was using https://github.com/cogciprocate/ocl for the actual interaction with the OpenCL runtime. You still need to write your own kernels.

Additionally, if you are interested in using OpenCL specifically for AMD cards, the new Vega would support the ROCm stack, which I think has similar "optimized" kernels (e.g. convolutions etc...) for the AMD cards.

Is there any technical paper about Spearow or how exactly it does things it does? From the examples, it looks like it runs all of the operations synchronously if I understand correctly?

drahnr commented 7 years ago

@botev the RX4xx series also supports the ROCm OpenCL backend, I have one here and did a few experiments with all 3 stacks on Linux/Fedora.

@cathalgarvey I am totally with you, I actually talked a lot to @subversive-owl about it and how to go about. I already looked into ocl has way more features than we need, and that complexity comes at price plus it's model does not really fit into the coaster architecture. So mayber re-use opencl-core, though right now I do not see any big gaps regarding features in the wrapper (or I am oblivious to them).

It describes all of the operations in sync, if things are done async is up to the backend implementation. If necessary, the API can easily adapted.

So, yes this is plannd is what I am most excited about after fixing the last fallout bits of the dropout cudnn implementation and finishing LSTM/RNN layers.

@botev right now there is the doc/book which describes quite a few things about the internal structure of juice, though I am not sure if it suffices for your needs.

drahnr commented 7 years ago

@cathalgarvey I'd be happy to streamline efforts and get more traction, every pull request is very welcome, and I'd happily talk about architecture stuff and how to share code or merge repositories in a combined effort

ghost commented 7 years ago

Looking at @jonysy's work on these two repos:

https://github.com/lychee-eng/parenchyma-blas https://github.com/jonysy/parenchyma-nn

..it looks like the skeleton of a good OpenCL backend already exists for a Leaf-derived project. I don't know how compatible the code in those repos is with coaster/juice as-is but it's probably a good place to start.

Regarding ROCm, I've experimented with it on my GPUs and my experiences haven't been great. It's sometimes unstable, and a recent apt-delivered update made my system so unstable I had to reinstall Ubuntu (and configure it to keep a stable kernel around for future use). I'm awaiting a fix now so I can resume using ROCm because the only way to do OpenCL 1.2+ on Ubuntu 16.04+ right now is ROCm. :(

Meanwhile, I'm using Mesa, which gives an OpenCL 1.1 runtime that's stable, and which appears to work with arrayfire. So I'm going to resume my experiments with Arrayfire-rs for a while, I'd like to make some type-safe wrappings that could be useful for ML. It's a shame they haven't stabilised the Arrayfire-ML repo yet, nor provided Rust bindings to it.

But, I'd prefer something that's pure-rust and targets many platforms at once, including framework-free CPU (e.g., I don't want to rely on Arrayfire being installed everywhere). Leaf/Juice/Parenchyma are the most mature-looking rusty platform to begin with, I think.

drahnr commented 7 years ago

@cathalgarvey I strongly recommend you to go with fedora and just install the opencl part of the AMDGPU-PRO packages, that works very well for me. This actually allows you to pick either of the ICDs. But this is a bit offtopic.

I know about parenchyma, but half a year ago, when I talked to @jonysy it seemed we were following different goals. I am happy to re-evaluate that.

The last time I checked arrayfire-rs it had a lot of open issues and seemed to be pretty slow compared to other frameworks (given the most important transformations used for ML, I am nost sure where that information came from though).

OpenCL 1.1 is a total no go, in may places it is already stated that it is not threadsafe and as such pretty useless. Most vendors managed to get to at least 1.2 though I am eyeing on 2.x for the sake of features and ease of implementation. But that is open for discussion. I don't mind the other way round either. PRs welcome.

ghost commented 7 years ago

I'd be happy to see OpenCL 1.2-2.x too; 1.1 would just have been the icing on an already great cake. :)

If 1.2 is supported, then it's possible that Coriander could be used to maintain a hybrid CUDA/CL codebase, though Coriander isn't bug-free yet so a rigorous test suite would be required, and it would probably limit the flexibility of coding in CUDA.

Arrayfire-RS is far, far from perfect; the lack of type safety and the poorly documented segfaulting exhausted me last time I tried to use it. Speed isn't much of a concern when the baseline is "No support for non-CUDA GPUs at all". Clearly, OpenCL or Vulkan etc. would be better than using an intermediary platform, but I'll take whatever I can get!

I'm looking at Parenchyma now to see what it can do as is; it looks like it stalled some months back unfortunately. There have been compiler changes in Nightly that make some of it illegal now (const fields in Traits), so it doesn't compile. I'll need to learn a bit before fixing.

drahnr commented 7 years ago

That was one of the reasons I decided to discard arrayfire-rs.

If the path of Vulkan continues as I expect it to, then OpenCL and Vulkan will merge. They are already similiar and with some hackery around Vulkan Compute shaders you can already do a lot. But for the time being: stick with OpenCL, I am not gambling and I am not keen on investing a huge chunk of time in a poorly performing backend. As such library choice is crucial. We can continue this chat in https://gitter.im/spearow/coaster

i don't see much use in using coriander. There is no cuda code. All there is is cudnn API calls.

Also: I'd rather invest effort in OpenCL kernels integrated into native rust than a language dependant abomination on top of C++.

drahnr commented 6 years ago

@cathalgarvey Id be happy to discuss a few more things on gitter/here regarding what has been done what is planned

ghost commented 6 years ago

Hi! Busy few days, sorry. But, I have been committing a little time to this:

One of the problems with bootstrapping OpenCL in languages other than C/++ is that any libraries that aim to make this easy, by providing kernels etc., often write their framework to dynamically generate kernels (using C Preprocessing, or a C engine). I gather that OpenCL can perform preprocessing to a certain extent, but the macros/defines would have to be in OpenCL source files.

I found something promising in Samsung's sadly-defunct deep learning framework, VELES; the only one to make OpenCL a first class citizen, so far. They have what looks like a full set of "basic" kernels for data and NN operations, between the core and the neural networks extension. And, they look like "regular OpenCL" kernels. :)

The license is Apache, is that good enough to use in Coaster if they were just copied in and worked around?

drahnr commented 6 years ago

I'd like to discuss a similar set of things, using https://github.com/djc/askama to generate structures from templates and fill them as required at runtime and save those artifacts in a cache so following runs don't required recompilation. This would allow to even merge a few operations into a single kernel, reducing GPU mem access and the introduced latency, but that is step 2.

The reason I'd like to stick to dual licensing Apache/MIT is mostly to allow GPL linkage which I don't want to rule out but Apache itself does not.

ghost commented 6 years ago

Cool crate! So, do you mean using templating to construct kernels on the fly using Rust Macros? That seems hard to optimise, though a fully-fledged Rust DSL that compiles into kernels would be the golden fleece for OpenCL as far as I'm concerned. :)

drahnr commented 6 years ago

@DiamondLovesYou pointed me to his great work of a compiler extension draft compiling rust functions to OpenCL which I am still looking into

jonysy commented 6 years ago

If anyone's interested, I've started working on Parenchyma again... I've updated Leaf to make it compatible but it's a bit outside my normal area of expertise in terms of implementing new algorithms.

drahnr commented 6 years ago

@jonysy I've seen the activity but I did not look into it yet, unfortunately I did not get around to get much done on juice / compute framework, this will hopefully change rather soon