Avoid the overhead of creating a PyErr for downcasting.

PyO3 / rust-numpy

PyO3-based Rust bindings of the NumPy C-API

BSD 2-Clause "Simplified" License

1.11k stars 106 forks source link

Avoid the overhead of creating a PyErr for downcasting. #326

Closed adamreichold closed 2 years ago

adamreichold commented 2 years ago

This does give us the best of both worlds, i.e. avoid having to create a PyErr for downcast but still keeping a single implementation for the extract logic. I am just not sure whether this is wort the additional effort, i.e. the extra ExtractionError enum.

kngwyu commented 2 years ago

I'm not sure how it matters in user experience. Anyway users can't get the enum and just get PyErr, right?

adamreichold commented 2 years ago

I'm not sure how it matters in user experience. Anyway users can't get the enum and just get PyErr, right?

The difference is purely internal w.r.t. performance, but it actually turns out to be a mixed bag after adding the benchmarks (I guess the enum is larger than PyErr hence slowing down everything but downcast_failure):

 name              main ns/iter  pr ns/iter  diff ns/iter   diff %  speedup 
 downcast_failure  84            52                   -32  -38.10%   x 1.62 
 downcast_success  18            21                     3   16.67%   x 0.86 
 extract_failure   84            111                   27   32.14%   x 0.76 
 extract_success   18            21                     3   16.67%   x 0.86

I therefore redid the whole thing using a more tricky IgnoreError type to just drop the error on the floor for downcast and still directly convert it into PyErr for extract.

This, together with checking pointer equality before calling into PyArray_EquivTypes, then yields an improvement across the board which I feel makes this change more reasonable:

 name              main ns/iter  pr ns/iter  diff ns/iter   diff %  speedup 
 downcast_failure  84            48                   -36  -42.86%   x 1.75 
 downcast_success  18            15                    -3  -16.67%   x 1.20 
 extract_failure   84            84                     0    0.00%   x 1.00 
 extract_success   18            15                    -3  -16.67%   x 1.20

kngwyu commented 2 years ago

The hack in is_equiv_to looks reasonable, but I'm a bit struggling in understanding what IgnoreError speeds up. It currently only works for is_type_of?

adamreichold commented 2 years ago

The hack in is_equiv_to looks reasonable, but I'm a bit struggling in understanding what IgnoreError speeds up. It currently only works for is_type_of?

Yes, it is used only for is_type_of which however backs downcast. The point of IgnoreError is that downcast does not return a PyErr, but a PyDowncastError iff is_type_of returns false.

So as indicated by the downcast_failure benchmark, calling downcast on something that does not match gets significantly faster, because no PyErr is allocated on the Python heap by is_type_of just to be dropped immediately afterwards. IgnoreError exploits this by just ignoring the error information entirely (since it will never reach a public API anyway).

Finally, failed calls to extract and downcast are important because Python methods which support multiple element data types safely will need to repeatedly try to extract/downcast a given PyAny value into any of the supported PyArray instantiations.

kngwyu commented 2 years ago

Hmm, the pypy failure looks like a bug of maturin 😓

adamreichold commented 2 years ago

Hmm, the pypy failure looks like a bug of maturin sweat

Reported at https://github.com/PyO3/maturin/issues/882

adamreichold commented 2 years ago

Will hold back on merging until Maturin 0.12.14 is released so that I can drop the last commit.