PyO3 / pyo3

Rust bindings for the Python interpreter
https://pyo3.rs
Apache License 2.0
12.39k stars 765 forks source link

How to pass string array parameters from Rust to PyO3 #1460

Closed rts-gordon closed 3 years ago

rts-gordon commented 3 years ago

🌍 Environment

💥 Reproducing

Hi there, I use Rust to call python pandas function via PyO3, but I did not sure how to pass string array parameters like ’names‘, would you like to give me a hand? Thank you very much.

pub async fn calculate_base_ohlc()-> PyResult<()> {
    debug!("test pandas");

    let gil = Python::acquire_gil();
    let py = gil.python();
    let pd = PyModule::import(py, "pandas")?;

    let arg1 = "./data/20201001.csv";
    let arg2 = "names=['s','u','c','a','v']";
    let arg3 = "index_col=1";
    let arg4 = "parse_dates=True";
    let args = PyTuple::new(py, &[arg1, arg2, arg3, arg4]);
    let df = pd.call1("read_csv", args);

    debug!("df = {:?}", df);

    Ok(())    
}

There are some compile errors:

021-03-03 15:37:05.810243 DEBUG [ohlc::base_ohlc::pandas:15] test pandas
sys:1: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.
2021-03-03 15:37:06.136407 DEBUG [ohlc::base_ohlc::pandas:33] df = Err(PyErr { type: <class 'ValueError'>, value: ValueError('header must be integer or list of integers'), traceback: Some(<traceback object at 0x00000238378729C0>) })
davidhewitt commented 3 years ago

@CHCP looks like some of the arguments you're trying to pass are keyword arguments.

These need to go in a PyDict, and then you can use PyModule::call instead of call1.

birkenfeld commented 3 years ago

Hi, and welcome to PyO3!

The first problem here is that passing keyword arguments needs to be done differently: the tuple in call1 is only for positional arguments. In your example you're doing the equivalent of pandas.read_csv("./data/20201001.csv", "names=['s','u','c','a','v']", "index_col=1", "parse_dates=True").

For passing keyword arguments, you need to use the "full" method, i.e. call, with a dictionary containing the keyword args. The second problem is the actual passing of the list of strings for names: for that, use an array or vector on the Rust side.

In short, you need something like this:

    let arg1 = "./data/20201001.csv";
    let args = PyTuple::new(py, &[arg1]);
    let mut kwargs = PyDict::new(py);
    kwargs.set_item("names", ["s", "u", "c", "a", "v"]);
    kwargs.set_item("index_col", 1);
    kwargs.set_item("parse_dates", true);
    let df = pd.call("read_csv", args, Some(kwargs));

Btw, the output you showed are not "compile errors" - compilation was successful and your code ran, producing the debug outputs you showed.

rts-gordon commented 3 years ago

@CHCP looks like some of the arguments you're trying to pass are keyword arguments.

These need to go in a PyDict, and then you can use PyModule::call instead of call1.

Thanks @davidhewitt

rts-gordon commented 3 years ago

Thanks @birkenfeld I change to use "call" function, the argument "name" should be use vec, like this:

    let list = vec!["s", "u", "c", "a", "v"].to_object(py);
    kwargs.set_item("names", list);

But there are another questions: I use python "pandas“ to calculate OHLC, the python code is:

        df = pandas.read_csv(file, names=[ 's','u', 'c', 'a','v'], index_col=1, parse_dates=True)
        data_c =  df['c'].resample(counts, closed='left', label='left').ohlc()
        data_a =  df['a'].resample(counts, closed='left', label='left').last()
        data_v =  df['v'].resample(counts, closed='left', label='left').sum()

But in Rust, ”df“ is a Result<&PyAny, PyErr> trait, it can't to call "resample" and "ohlc/sum/last" functions. so How can I do for this?

pub async fn calculate_base_ohlc()-> PyResult<()> {
    debug!("test pandas");
    let gil = Python::acquire_gil();
    let py = gil.python();
    let pd = PyModule::import(py, "pandas")?;

    let arg1 = "./data/20201001.csv";
    let args = PyTuple::new(py, &[arg1]);
    let mut kwargs = PyDict::new(py);
    let list = vec!["s", "u", "c", "a", "v"].to_object(py);
    //kwargs.set_item("names", "['s', 'u', 'c', 'a', 'v']");
    kwargs.set_item("names", list);
    kwargs.set_item("index_col", 1);
    kwargs.set_item("parse_dates", true);
    let mut df = pd.call("read_csv", args, Some(kwargs));
    debug!("df = {:?}", df);

    let arg2 = "T";
    let args2 = PyTuple::new(py, &[arg2]);
    let mut kwargs2 = PyDict::new(py);
    kwargs2.set_item("closed", "left");
    kwargs2.set_item("label", "left");
    let res =  df.call("resample", args2, Some(kwargs2));
    let data_c = res.call("ohlc");

    Ok(())    
}

Thanks again for your help.

davidhewitt commented 3 years ago

Result<&PyAny, PyErr> (aka PyResult<&PyAny>) is an enum, which is ether an Ok value containing the function result, or an Err containing the exception raised. This is how Python's exceptions map to Rust.

I see your example pyfunction returns PyResult<()>. So you can propagate the exception and otherwise use the value contained with the ? operator.

So you should write

let mut df = pd.call("read_csv", args, Some(kwargs))?;

And now df will be an &PyAny you can do things with.

birkenfeld commented 3 years ago

Can you try with kwargs.set_item("names", vec!["s", "u", "c", "a", "v"]);?

To call methods on the resulting PyAny, use the call_method family of methods.