bytedeco / javacpp-presets

The missing Java distribution of native C++ libraries
Other
2.68k stars 744 forks source link

cuDF support (CUDA Data Frame) #665

Open archenroot opened 5 years ago

archenroot commented 5 years ago

As per my understanding this could become defacto standard for Spark integration with Rapids for GPU accelerations in future. I will work on this API, just put here issue for reference.

saudet commented 5 years ago

Well, it looks like these guys like to do it the hard way too after all: https://github.com/rapidsai/cudf/pull/1995

Nevertheless, there's probably a lot of features from the C++ API not accessible with those wrappers, so similarly to OpenCV, I think it's still worthwhile to maintain separate wrappers for the C++ API.

archenroot commented 5 years ago

Shame on me, I waited for more stability and finally missed the right point in time, I am now finishing job in UAE and returning to EU region with some vacation, I will look at that (also I still didn't push the gunrock apis, shame on me :-) )

razajafri commented 5 years ago

@archenroot have you started work on the presets for cudf?

I started reading about JavaCPP and it seems like I need a list of headers in order of precedence. Is that true? If so have you started compiling a list already that I can add to?

archenroot commented 5 years ago

@razajafri - I am more monitoring what rapids team is going to decide https://github.com/rapidsai/cudf/pull/1995

razajafri commented 5 years ago

@archenroot I am a contributor on rapids java bindings and the reason why I reached out is so I could evaluate javacpp. Please let me know if you have done any work on it that I can build on top.

saudet commented 5 years ago

@razajafri I see! Thanks for reaching out. The C++ API itself looks pretty clean, so it shouldn't be harder to map than CUDA itself, which is basically these presets files here: https://github.com/bytedeco/javacpp-presets/tree/master/cuda/src/main/java/org/bytedeco/cuda/presets

We do need to list the headers files that we wish to map in an order that makes sense with respect to C++, yes, a topological sort of sorts. (Something that could be automated up to a point, which will happen when I get the chance to work on this, but probably not before a couple of years...)

Now, if I understand correctly, cuDF depends on Arrow, so we would need to map that one first. The official Java wrappers for Arrow are pretty limited and not very efficient, so we are already considering providing our own wrappers for the C++ API. In other words, it's something I will probably get to do as part of my work anyway (and then other developers will be able to start providing more idiomatic APIs on top of that easily). Do you guys have a timetable in mind for this?

/cc @agibsonccc

razajafri commented 5 years ago

@saudet thanks for the detailed explanation. cudf doesn't directly expose any Arrow APIs that I know of. Do we still need to provide Java presets for it? Can it not just be a lib dependency instead?

TBH we don't have the bandwidth for this I was going to spend a couple of hours on it to see how easy/difficult it is to add Java presets for cudf. I am still willing to contribute as much as I can as a personal goal of mine. I would love to get on a call with you or anyone else willing to go over the process of creating presets. I have read the documentation already, I would like to know more about automating the header topsort.

saudet commented 5 years ago

If it doesn't expose any data structures from Arrow, yes, we don't need to do anything for that separately. Although it would enhance interoperability if we did, so still worth to do at some point in any case.

Listing the header files really isn't an issue. What takes most of the time is figuring out the right "info" to make everything parse and compile. The other thing that takes most of the time is understanding how to make the library actually build. cuDF doesn't appear easy to build, for example, see issue https://github.com/rapidsai/cudf/issues/2770. Imagine you were a newbie and had to build cuDF on all supported platforms. I estimate that it would take about the same amount of time that we need to tinker with the header files to get them working properly, at least a few days. Does that sound like too much work? On the other hand, wrapping everything manually with JNI would take much much longer, while a limited Java API wouldn't be useful for many use cases.

Another way we could separate the workload is having me write the presets for cuDF, and having someone like you get the builds passing on, for example, Travis CI, and write the high-level APIs on top of the presets. This is something that I would be able to do as part of my work since we'd be in effect collaborating with NVIDIA on that. We can have calls too if you'd like, that's fine. Please send me email anytime you'd like to schedule something!

saudet commented 4 years ago

@razajafri I've finished creating initial presets for Arrow here: https://github.com/bytedeco/javacpp-presets/tree/master/arrow Presets for cuDF would most likely be very similar to that and once created they are very easy to maintain. If this looks like something you would like to use either as is or to build a high-level API on top, please let us know!