crate-py / rpds

Python bindings to the Rust rpds crate for persistent data structures
https://rpds.readthedocs.io/
MIT License
41 stars 14 forks source link

Ship free-threaded wheels #101

Open ngoldbaum opened 2 hours ago

ngoldbaum commented 2 hours ago

My real goal with #100 was to enable shipping free-threaded wheels. I'm actively working on improving the state of free-threaded support for popular packages on PyPI and this package is one of them. I'll be linking to this issue from https://py-free-threading.github.io/tracking/. See that website for lots more information on supporting free-threaded Python.

Looking at the existing tests, they don't use the python threading module at all and there aren't any rust-level tests. Is that correct?

One thing we could try without writing tests for explicit multithreaded use is running pytest-run-parallel on the full test suite. That wouldn't catch thread safety issues caused by sharing data structures provided by rpds between threads, but it would catch issues due to use of global state.

I see this library already uses new_sync to create the rpds data structures, so according to the rpds docs everything should be thread safe. Do you agree with that assessment?

But also I don't actually see any multithreaded tests in the rust rpds library itself, so I'm not sure how carefully validated that is besides the fact that cargo parallelizes tests automatically.

Do you have any ideas for ways to validate the thread safety of the python library before we upload wheels and declare the rpds module is thread-safe?

Julian commented 2 hours ago

Looking at the existing tests, they don't use the python threading module at all and there aren't any rust-level tests. Is that correct?

Correct.

I see this library already uses new_sync to create the rpds data structures, so according to the rpds docs everything should be thread safe. Do you agree with that assessment?

Yes that was my understanding as well, though I did less work than it sounds like you did already to check how well this is tested in rpds itself.

Do you have any ideas for ways to validate the thread safety of the python library before we upload wheels and declare the rpds module is thread-safe?

I like your pytest-run-parallel idea certainly -- I'd otherwise have likely added at least one test which explicitly spins up a few threads and then perhaps uses hypothesis to randomly add data across them -- but I don't think it should block calling the package thread-safe if we're relying on rpds meeting what it claims.

Said more "brutally", I'm personally OK with letting someone tell us if it turns out there are thread safety issues.

(And again sincere thanks for all the help so far and for the work you're doing more broadly!)

ngoldbaum commented 2 hours ago

OK great, I'll keep the hypothesis idea on the backburner, I don't think I want to take on setting up hypothesis to do that since I've never set up hypothesis from scratch, let alone in a multithreaded context. That said, I bet we could leverage hypothesis to make it easier to productively start from scratch on writing multithreaded tests for sharing data structures here and in many other packages. I'll bring this up at our next internal sync on our ecosystem compatibility team.