apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.78k stars 6.79k forks source link

Support for other Device Types, OpenCL AMD GPU #621

Open philtomson opened 9 years ago

philtomson commented 9 years ago

It would be nice to eventually have OpenCL support for those of us with GPUs that don't do CUDA.

jermainewang commented 9 years ago

Hi,

We are considering this as well! The thing is that for company like AMD, they are actually developing environments that is compatible with CUDA in the future: http://www.anandtech.com/show/9792/amd-sc15-boltzmann-initiative-announced-c-and-cuda-compilers-for-amd-gpus. So we are still deciding whether we are going to put our limited human resources into supporint OpenCL. If you are interested, you are more than welcome to help us enhance the project in this direction.

Thank you, Minjie

gujunli commented 9 years ago

Hi Minjie,

Maybe we can collaborate on extending MXnet with OpenCL support. we have Opencl caffe open sourced, I guess we can reuse the core kernels?

thanks! Junli On Nov 18, 2015 10:53 AM, "Minjie Wang" notifications@github.com wrote:

Hi,

We are considering this as well! The thing is that for company like AMD, they are actually developing environments that is compatible with CUDA in the future: http://www.anandtech.com/show/9792/amd-sc15-boltzmann-initiative-announced-c-and-cuda-compilers-for-amd-gpus. So we are still deciding whether we are going to put our limited human resources into supporint OpenCL. If you are interested, you are more than welcome to help us enhance the project in this direction.

Thank you, Minjie

— Reply to this email directly or view it on GitHub https://github.com/dmlc/mxnet/issues/621#issuecomment-157820246.

jermainewang commented 9 years ago

Ha, it's great to hear voice from AMD people here :p. I heard that the problem of integrating OpenCL is mainly due to its support for template which is widely used in mshadow. @tqchen may know more details about this.

Minjie

mli commented 9 years ago

@gujunli it's very nice to see you here, (we met at icml beijing last year). we are definitely interested on opencl, and hope to support it asap. but the current issue is that we used C++ template while opencl doesn't support it, see https://github.com/dmlc/mshadow/issues/71

gujunli commented 9 years ago

template is an issue. On AMD devices we have special keyword to support, no problem. The problem is the same key word does not work on NV GPUs. we are also figuring out a general solution. I would like to hear your thoughts on this.@limu @minjie

junli

On Wed, Nov 18, 2015 at 11:43 AM, Mu Li notifications@github.com wrote:

@gujunli https://github.com/gujunli it's very nice to see you here, (we met at icml beijing last year). we are definitely interested on opencl, and hope to support it asap. but the current issue is that we used C++ template while opencl doesn't support it, see dmlc/mshadow#71 https://github.com/dmlc/mshadow/issues/71

— Reply to this email directly or view it on GitHub https://github.com/dmlc/mxnet/issues/621#issuecomment-157839013.


Junli Gu--谷俊丽 Coordinated Science Lab University of Illinois at Urbana-Champaign


mli commented 9 years ago

nvidia gpu should be fine with cuda. our main motivation to support opencl is for amd gpus and other devices, such as fpga. for example, altera also contacted us to make mxnet run on their devices.

tqchen commented 9 years ago

This won't pose a problem as long as AMD's version of compiler support somewhat similar thing as nvcc does, i.e. template programming and allow integration of host and device code.

What can be done is to have something like tensor_gpu-inl.amd.cc to specialize for AMD's version of keyword. As long as the extra keyword is minimum and the compiler can be detected by marco, it should be fine.

philtomson commented 9 years ago

It would be nice to be able to target FPGAs and OpenCL would allow that to be done much more easily through the Altera tool chain.

Also: I'm not sure I understand the templates issue, isn't there a C API that could be used to get around that?

philtomson commented 9 years ago

I'll also add that OpenCL would allow targetting Intel Integrated Graphics which is pretty common on a lot of laptops as well as desktops these days.

vchuravy commented 9 years ago

@philtomson The problem is more for the kernel code. OpenCL uses C as a language for its kernels and MXNet uses C++ for CUDA and CPU kernels and is able to generate both from the same template, which is nice because you don't need to support 2 or 3 different versions of things.

ieee8023 commented 8 years ago

+1 I want to experiment on my laptop which does not have cuda support!

liangfu commented 8 years ago

@vchuravy Speak of portability, maybe it's the problem of using template itself in mxnet, because a neural network implement doesn't really need templates for different data types. For a typical neural network implementation, single precision floating point is most commonly used, because double precision is unnecessary and leads to much more computational cost, and half precision computation is not native supported among many devices. Using fixed point data types are completely another case for performance optimization. What people really want is a single efficient, flexible, minimal and yet portable neural network implementation, that can be ported to multiple CPUs, GPUs and FPGAs. The design principle of mxnet meets almost all of these features except the last one.

mz24cn commented 7 years ago

Is there anyone who tried AMD HIP tools on MXNet?

kernel8liang commented 7 years ago

+1

skirdey commented 7 years ago

+1

ghost commented 7 years ago

Really want to see this happen someday for a major Python framework besides Tensorflow (and without using a limited, experimental, proprietary compiler framework). Competition!

mz24cn commented 7 years ago

https://www.khronos.org/registry/OpenCL/ opencl 2.2 C++ language, including templates support, now is in provisional status. Of course, till now there is no manufacturers releases 2.2 drivers.

delijati commented 7 years ago

This could help convert the cuda kernels to opencl https://github.com/hughperkins/cuda-on-cl

windywinter commented 7 years ago

Hi all,

I've been trying to tackle this problem for some time. From my investigation, cocl does not work very well because mshadow is built on Thrust which uses a lot of CUDA host side API that are not supported by cocl. @delijati Therefore, what we found promising is to use VexCL as the vector expression library (instead of mshadow) for GPU device. Currently I have most arithmetic operators on NDArray working but still need to fill in a hell lot of symbolic operators for the whole framework to work. Proof of concept code is here: https://github.com/windywinter/mxnet

viper7882 commented 7 years ago

Hi all,

I'm looking at PyOpenCL and it could be a solution for MXNet. The challenge that I've observed so far is PyOpenCL requires installation of Intel Open CL SDK on user's machine (if they are running Intel Graphics Card).

An example shared by Easy OpenCL with Python is that Gaston Hillar has demonstrated to use only 12 steps to build and deploy a kernel with PyOpenCL. I've tested his codes and it is working for me.

I wonder if MXNet would consider to support PyOpenCL?

viper7882 commented 7 years ago

Update: I've tested DeepCL by Hugh Perkins to run using Intel Graphics Card to run Q-Learning and it runs perfectly in Python 2.7: https://github.com/viper7882/DeepCL.

Hugh Perkins has created EasyCL to access OpenCL based GPU @ https://github.com/hughperkins/EasyCL. I'm evaluating if it is possible to merge DeepCL with MXNET. Looks challenging to me to merge the two due to the difference of underlying structure. Any help is appreciated.

viper7882 commented 7 years ago

Hi @jermainewang ,

Hugh Perkins has provided NVIDIA® CUDA™ cuDNN API for Coriander, OpenCL 1.2 which ideally should be able to interface with existing Mxnet NVIDIA® CUDA™ cuDNN API.

Could you take a look if it make sense to connect Mxnet with OpenCL through this interface?

ghost commented 7 years ago

Also: ROCm/HIP support for mxnet is a thing, might be worth moving wholesale that direction to cover CUDA/HIP ootb, and supporting OpenCL via Coriander. Not sure whether Coriander works on HIP code, but if the HIP is compiled via the CUDA path I don't see why not.. might even reduce the API surface for Coriander to cover?

https://github.com/ROCmSoftwarePlatform/mxnet -- Sent from my Android device with K-9 Mail. Please excuse my brevity.

tqchen commented 7 years ago

Would like to update on this this can now be done via https://github.com/dmlc/tvm

springishere commented 7 years ago

@tqchen do you mean that TVM supports opencl? I would like to use mxnet with opencl to use ARM GPU (Mali).

tqchen commented 7 years ago

yes, TVM support OpenCL, Metal, CUDA, ARM x86 javascript

kpot commented 7 years ago

Hi all,

Guys, can anyone explain why mxnet still doesn't support OpenCL out of the box, even though it is based on nnvm now and through it on tvm, and should be able to perform all necessary computations using OpenCL devices? I checked on nnvm recently and it looks fully up to the task. But even in the upcoming mxnet 0.12, context mxnet.gpu() still means only "CUDA" and has no associations with tvm.gpu() or tvm.cl(). Why?

Perhaps more than 30% of consumer GPUs around are AMD/Intel-made devices supporting OpenCL >= 1.2. Very often it's a great, inexpensive and less restricted hardware, and making it available for training would greatly benefit ML community.

welnaseth commented 6 years ago

Any updates on this? @kpot makes a good point above, tvm (and nnvm due to being built off of it) supports opencl, so to me it seems like it shouldn't be too hard to implement opencl as an option. It would be nice to have a timeline for when this can be implemented and if not, what things are blocking it?

conradwt commented 6 years ago

Hi All, are there any updates regarding this topic because I would like to see OpenCL be the default for MXNet as well as other ML libraries and frameworks instead of restricting GPU compute only to Nvidia hardware and CUDA?

itsergiu commented 6 years ago

Do you already provide an installation kit for AMD GPU RX550? Does it work with Windows 10? Does it work with Jupyter, Anaconda and Keras on top of Tensorflow?

edmondja commented 6 years ago

+1 waiting for it

dmidge8 commented 6 years ago

Also hoping to have it!

imkow commented 6 years ago

waiting for this...

aenikata commented 6 years ago

At the moment there's cloud providers like gpueater pushing the AMD option, which naturally leads towards Keras+PlaidML not MXNet. My ideal would be to be able to take one of the (almost universally AMD-based) cryptocurrency rigs you can pick up for a reasonable price and see what deep learning you can do with it.

ammarRajabA commented 5 years ago

Can anybody update us about this?

mz24cn commented 5 years ago

https://github.com/mz24cn/clnet

metal3d commented 4 years ago

TensorFlow, PyTorch, MxNet... none of them listen to the users for that need. I've got a Intel card on 3 laptops, using NEO opencl with LuxRender for example and it computes 7x to 20x faster. But for ML, I can't.

OpenCL is not restrictive, open, works on a large variety of card, even Raspberry can use OpenCL, cf. Pi OpenCL.

Please, consider SYSCL for example. We are not all able to pay thunderbolt hardware...

leezu commented 4 years ago

@metal3d contributions welcome. Also see TVM

metal3d commented 4 years ago

@leezu excuse me, but your remark seems to not be serious. "Contributions welcome" is like to say "do it if you're so strong".

The core of kernel compilation for machine learning in that kind of framework is "central", that's something that is chosen at the beginning and along the development process.

Contribution by "one guy external to the project" is not possible for that. If I want to work on that:

The problem is that we are asking for OpenCL in a lot of frameworks since months, or years - and there is rarely some answers about:

We don't force authors to use OpenCL, we only wonder why there is nothing done in that direction. Look at that issue, it is opened since 4 years.

4 years !

Look at the question on TF: https://github.com/tensorflow/tensorflow/issues/22

4 years too.

As for Mxet, we never had "clear" answer. No track of something that can explain why and/or how to fix that need. If someone is working on...

Worst: TF has only one active project to compile it with SYSCL. You need to register your user, and try a long compilation that fails 90% of time.

So, sorry if my comment, question, and answers seem to be "aggressive" but 4 years is a bit long without any clear answer like "we won't do that", or "we cannot", or "we will try" and/or why it is not in the way.

So, "contributions welcome"... please... it's like if you're telling to someone in the street "sorry, what time is it" for 5 hours... and after that the man answer "go buy a watch"

leezu commented 4 years ago

I don't see any blocker to add the feature you're requesting, just there's noone willing to work on it. You pointed out the constraints correctly, at lot of ressources are required. Thus my comment is serious. TVM will solve the problem in the not too-far future, so there is no strong incentive to invest resources now into manually writing code targeting OpenCL. Did you take a look at https://docs.tvm.ai/tutorials/get_started.html#generate-opencl-code ?

metal3d commented 4 years ago

At first, thanks for your answer.

I don't see any blocker to add the feature you're requesting, just there's noone willing to work on it.

That's the problem we point. The problem that I see (and other than me can see also) is that it seems that major frameworks are trying to make things "faster and easier" before to make the framework more largely usable. That's all we say... That's cool that CUDA is supported and that AWS or Google proposes GPU on demand. But in reality, OpenCL can help to make ML more accessible for modest hardware owners.

And it's now 4 or 5 years that the problem persists. I wish you understand the frustration.

More than that, this give a large monopoly to NVidia that no one seems to want to stop...

As explained in https://towardsdatascience.com/on-the-state-of-deep-learning-outside-of-cudas-walled-garden-d88c8bbb4342 article:

Open source code that targets only a proprietary target is not exactly open open source. We can do better!

And I agree with that.

You said:

Thus my comment is serious.

Excuse me, it could be a translation problem (I'm not English, excuse my bad English BTW), but in French it sounds like "do it yourself". That probably why I answered a bit aggressively.

TVM, no, sorry I didn't know that project and I will take a look. I'm not sure it will resolve the issue, but reading the page you pointed seems to be interesting. Thanks for that.

I hope that you don't take my comment too severely.

joaomamede commented 3 years ago

@metal3d It's the tradition of where the coders are. Some projects opt to cut resources to the minimum working objective, meaning that integration to a wide variety of choices is left behind and spending money on other things doesn't seem to be a problem though (like nvidia HW). Why do you think people still code these things in windows although it's a terrible platform from it. Tradition. It's the sad reality of resources limitations and mostly tradition of training. RocM now works with mxnet apparently..by using nvcc code lol. I also think openCL should be the way to go, as intel,amd,nvidia, etc are all supported. And I guess for work, I'll be forced to buy a 3x the price nvidia (instead of 3 GPU of same performance) to run my software because most toolkits I use are for cuda. AMD is a rich company and they lagged behind, and now are forced to adapt ROCm to CUDA....instead of having something more generalistic

samurai815 commented 3 years ago

+1 waiting for it