Open timsaucer opened 1 week ago
I have some experience in this area. While at NVIDIA, I created a POC with Rust bindings around cuDF and then provided interoperability with arrow-rs. Unfortunately, that code was internal and not open-source. I used cxx to create the bindings.
This repo (datafusion-python) once contained a prototype of translating DataFusion logical plan to cuDF operations (all in Python). It is still there in the history somewhere.
I see that there is now one RAPIDS library that provides Rust bindings: https://docs.rapids.ai/api/cuvs/nightly/rust_api/ so it may be interesting to see what approach they took to wrap C++ in this case.
edit: cuvs is using bindgen
This repo (datafusion-python) once contained a prototype of translating DataFusion logical plan to cuDF operations (all in Python). It is still there in the history somewhere.
Possibly #602.
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
As other DataFrame libraries start moving to leveraging GPU resources, it would be useful to see if we could leverage the work already done in pandas and polars for interoperating with cuDF to give a similar experience in DataFusion.
Describe the solution you'd like
Evaluate the level of effort and technical limitations to using cuDF to evaluate DataFrames. Also worth evaluating is their c++ interface which we could potentially bring in to DataFusion upstream if we are willing to write the appropriate wrappers.
Describe alternatives you've considered
Leave as is.
Additional context
This task is really just focused on researching what would be required and if there is an opportunity here.