PyO3 / rust-numpy

PyO3-based Rust bindings of the NumPy C-API
BSD 2-Clause "Simplified" License
1.15k stars 112 forks source link

Failed to execute the basic example #328

Closed GoodManWEN closed 2 years ago

GoodManWEN commented 2 years ago

Hi everyone, love this project as it just meets my recent needs. As a 'hello world' test, however, I'm having problems running the rust-parallel example.

According to the notes in the readme I need to first run nox to create a virtual environment. Since I am not familiar with nox I created the environment myself and installed the dependency manually.

~# python3 -m venv .env
~# source .env/bin/activate
(.env) ~# pip install numpy

Then I created a new project using marutin and manually copied all the files in rust-numpy/examples/parallel/ to overwrite the original.

(.env) ~# mkdir rust_parallel
(.env) ~# cd rust_parallel
(.env) ~/rust_parallel# maturin init
(.env) ~/rust_parallel# cp -r ../rust-numpy/examples/parallel/ .

When I try to run maturin develop, I got the following error

(.env) ~/rust_parallel# maturin develop
💥 maturin failed
  Caused by: Cargo metadata failed. Does your crate compile with `cargo build`?
  Caused by: `cargo metadata` exited with an error: error: failed to load manifest for dependency `numpy`

Caused by:
  failed to read `/home/rust/Cargo.toml`

Caused by:
  No such file or directory (os error 2)
(.env) ~/rust_parallel# ^C

The error message indicates that numpy is not installed, which is not the case, and also the pointing of Cargo.toml is not correct. I'm not quite sure which step is wrong, wondering if anyone can help, thanks.

Besides, since this is the first time experience this project, I have a few other questions. One is that, since it is not explicitly stated in the documentation, I would like to know if this project avoids type conversion and type copy overhead between py and rust. As we know, if you pass a two-dimensional array to rust with pyo3 way, you may need to go through a type conversion from numpy.int to python.int then to rust.usize, which will consume a lot of resources when there is a lot of data, and I think it would be great if this was avoided in this project.

Another point is that due to some kind of missing clarity I'm not quite sure what the examples/parallel project is aimed at. Since its project name is 'parallel', does it mean that this project can break the GIL limit of python and get multiple cores used at the same time in one process for simultaneous computation (which is necessary in production environments). I' m interested in the par_map_collect function called in the example, which seems to serve such a role, but unfortunately I didn't find the relevant search results in the documentation.

davidhewitt commented 2 years ago

The example has numpy as a path dependency - https://github.com/PyO3/rust-numpy/blob/19bfc9d2d0e72bb3ed30c08244ee81abd02d7386/examples/parallel/Cargo.toml#L13

You will need to update this to be the version of numpy you want to use.

GoodManWEN commented 2 years ago

The example has numpy as a path dependency -

https://github.com/PyO3/rust-numpy/blob/19bfc9d2d0e72bb3ed30c08244ee81abd02d7386/examples/parallel/Cargo.toml#L13

You will need to update this to be the version of numpy you want to use.

Thanks for the reply, but I'm not quite understand the writing way under [dependencies], does it mean that I have to point to some relying items that is not rust but python? Should I write the numpy directory under site-package here?

adamreichold commented 2 years ago

Thanks for the reply, but I'm not quite understand the writing way under [dependencies], does it mean that I have to point to some relying items that is not rust but python? Should I write the numpy directory under site-package here?

This is about the Rust-side dependency on the crate numpy, not the Python package. In this particular case, you need to replace the path dependency (which works when the example is part of this Git repository) by the version number to pull from crates.io, e.g.

[dependencies]
numpy = "0.16"
messense commented 2 years ago

I'd recommend read the cargo documentation about dependency management if you are new to Rust.

adamreichold commented 2 years ago

One is that, since it is not explicitly stated in the documentation, I would like to know if this project avoids type conversion and type copy overhead between py and rust. As we know, if you pass a two-dimensional array to rust with pyo3 way, you may need to go through a type conversion from numpy.int to python.int then to rust.usize, which will consume a lot of resources when there is a lot of data, and I think it would be great if this was avoided in this project.

This crate only wraps NumPy ndarray instances, it does not copy or convert them. This also implies that if you Rust-side function expects e.g. PyReadonlyArray2<f64>, but you pass in a one-dimensional array or one storing numpy.int, then this will fail due to a type mismatch.

All operations on these arrays are then typically performed in terms of ArrayView providing by the ndarray crate which still only wrap the existing data without any copying or conversion, i.e. the PyReadonlyArray::as_array method.

Another point is that due to some kind of missing clarity I'm not quite sure what the examples/parallel project is aimed at. Since its project name is 'parallel', does it mean that this project can break the GIL limit of python and get multiple cores used at the same time in one process for simultaneous computation (which is necessary in production environments). I' m interested in the par_map_collect function called in the example, which seems to serve such a role, but unfortunately I didn't find the relevant search results in the documentation.

As written above, operating on the array contents is usually done by operating on ArrayView instances from the ndarray crate, which provides the par_map_collect method. This way, while the GIL is still held, the actual numerical operations are dispatched into Rayon's thread pool and thereby utilise multiple hardware threads.