Open dariocurr opened 1 week ago
The binding is created as follows:
#[pymodule]
fn module(_: Python<'_>, main_module: &PyModule) -> PyResult<()> {
main_module.add_function(wrap_pyfunction!(get_context, main_module)?)?;
Ok(())
}
I haven't previously done what you're trying to do, so a minimum github repo to reproduce would be helpful.
This error is the python runtime trying to import datafusion
, and it isn't clear to me from your example why / where that would happen.
ModuleNotFoundError("No module named 'datafusion'")
I haven't done any digging to see why the code is like this, but the end-result is that you probably will need datafusion
as a python
dependency.
Storing the tokio
runtime on the python heap to ensure it only gets created once, which provided performance improvements.
It maybe could be created once per SessionContext
or similar, but that would be a decent lift of a refactor.
@andygrove confirmed that your use case is something that datafusion-python
should support, and points to datafusion-ballista as an example.
@dariocurr, do you have a repo link to share, so that I can investigate further?
I just created an MRE here my-library-datafusion.
Following the instructions and running:
conda env create
maturin develop
pytest tests
You will get:
ModuleNotFoundError("No module named 'datafusion'")
I really don't know how it should work, I am here to ask and learn from you.
Thank you for your time
Hi guys, I'm Dario. I have been struggling with an issue and I am trying to understand it.
I am trying to create my own cross-language library on top of
datafusion
anddatafusion-python
. Let's call this librarymy-library
.I created a rust workspace and I have two crates:
my-library
, to be used by other rust cratesmy-library-python
, to be used by other Python packagesmy-library
hasdatafusion
as a dependency and has just one function returning adatafusion::execution::context::SessionContext
my-library-python
, hasdatafusion-python
as a dependency and has just one function wrapping thedatafusion::execution::context::SessionContext
in adatafusion_python::context::PySessionContext
Now. when I install
my-library-python
in my python env throughmaturin develop
and try to play with theSessionContext
returned by the binding as follows:I get the following error
My question then is: Why should I add
datafusion
as a dependency in my python package, duplicating the library? Is there a way to bring the dependency in my binding?