Closed Rik-de-Kort closed 3 years ago
Per https://github.com/HypothesisWorks/hypothesis/blob/master/README.rst#hypothesis-for-other-languages
[Hypothesis for Ruby] worked pretty well, but used a core Rust implementation that is unfortunately not compatible with recent versions of Rust, due to its dependency on Helix (which now seems to be mostly unmaintained). As a result it is currently unsupported pending a rewrite of the bridging code between Rust and Ruby. We don't at present have the time or funding for this project, but it is likely not a massive undertaking if anyone would like to provide either of these.
If you're interested in helping to bring this back to life, awesome! I'd suggest starting with a port of https://github.com/DRMacIver/minithesis to Rust, and then we can look at the bridging code to Ruby or Python. (and perhaps using it as a proptest
backend in Rust itself?)
Thanks for the response, I will get going on it!
Hi! @Rik-de-Kort let me know if we can join forces :) I was thinking about working on this area for a long time :)
Hi @Stranger6667 for now I'm still getting my head around the core of minithesis (time is a bit limited). I would like to implement the basic basic functionality (test harness, generation, and shrinking) first and then we can start hacking together if that's alright for you?
Yep, that's great :)
Some basic stuff up at https://github.com/Rik-de-Kort/minithesis-rust/
Will get to working on shrinking tomorrow or tuesday. :) Reviews and comments always welcome.
I think the main ideas are present now. @Stranger6667 would you mind having a look? I think you're a bit more used to programming Rust.
My main gripe with the current implementation is that self.consider exists at all: I don't think it should be necessary. Additionally, it too closely mimics the python implementation which has a lot of try-catch and that's not Rusty™
@Rik-de-Kort
Awesome! I will take a look today/tomorrow :)
Hi @Zac-HD there's a lot of stuff in the repo currently (here: https://github.com/Rik-de-Kort/minithesis-rust). Would you mind having a look? We're still missing some top-level stuff (namely a test-runner-decorator-thing that can also tell you if something is unsatisfiable), but inner functionality is mostly complete. We did shift some API's around, mostly to let errors bubble up using Result
rather than using exception based control flow. Would be happy to hear your comments!
Happy to! Last week was packed, but I have some more time now :smile: Initial thoughts:
It looks like the database maps each key to a single value, but Hypothesis has a set of values for each key (which may be empty). We also annotate this as bytes -> Iterable[bytes]
because we only guarantee that you can iterate over the values (not do set ops), but they're informally expected to be unique.
For struct TestCase
, we make targeting_score
a potentially-empty map of labels to numeric scores. This enables multi-objective optimisation, though I realise that supporting various numeric types takes more work in Rust than Python.
There are a bunch of traits which it would be nice to interoperate with - the arbitrary
crate (fuzzing), proptest::arbitrary::Arbitrary
, quickcheck::Arbitrary
, etc. Per here making types the only API is (we think) a mistake in a PBT interface, but as the existence of st.from_type()
suggests it's a useful option if it can be integrated into a more customised strategy.
The possibilities and generation logic look fine. The reduction/shrinking logic also looks fine, though there's obviously a fair distance between minithesis and the full Hypothesis backend :wink:
If the goal is to replace the conjecture
backend with a Rust version, the next steps are probably to work out how to call Rust code from Python, and then implementing e.g. the ConjectureData
class. That would be a big project, since it touches so many things, but on the other hand it would also be usable pretty much immediately. Or you might want to try translating a more isolated part first, e.g. the Pareto front logic is self-contained and performance-sensitive.
I don't have much to comment about "targeting" / "arbitrary" parts (I need to dive deeper into these areas), but there are some thoughts about other relevant things.
Database. This implementation comes from a more complete version. I adapted it to match the minithesis interface, but the original one passes all non-Python-specific tests from the Hypothesis test suite on the Rust side and on the Python binding side as well.
I also tried to run the Hypothesis test suite with replaced DB classes. Most of the tests passed, except ones that fail due to the ExampleDatabase
subclassing check and some introspection issues (checking the data
attribute in tests). Otherwise, it is a drop-in replacement that for the directory-based version is 1.5x-4x faster than the current implementation. The in-memory version is slower in some operations and faster in others. I assume it happens mostly because on that scale (nanoseconds), converting from Rust-side structures to Python ones takes enough time to be visible.
Python bindings. Here are bindings implemented with PyO3 for that database part. There is a bit of unsafe, which I didn't find out how to implement otherwise to have a proper iterator protocol :( Generally, there are some corner cases with PyO3
, which imply some performance overhead. E.g., it doesn't support generics or lifetimes (other than 'static
) in structs, but it can probably be worked around with some manual (and potentially unsafe) reference handling. Otherwise, it supports Python 3.6-3.9 and Linux/macOS/Windows. Also, I have an article on implementing Python bindings for Rust crates with much more technical details which may be relevant to implementing this kind of interop.
Or you might want to try translating a more isolated part first, e.g. the Pareto front logic is self-contained and performance-sensitive.
To me, it sounds like a good way to go - I found that the charmap implementation is quite isolated as well. The nice part is that there is already a crate that generates static tables with Unicode intervals data, so even with an incomplete implementation, it is order-of-magnitude faster than the current implementation (if not taking caching into account). Having static Unicode interval tables on the Python side will give a similar performance, though.
Minithesis. Probably we can submit a PR to add this port to the ports list? There are also a few public API ideas that could be implemented (the builder pattern and a proc-macro), but it seems like the core parts are implemented, and those API ideas are not critical to have. How implementing the conjecture
backend would look like in terms of code organization? Separate crate within minithesis or maybe separate repo? I could suggest something like in ripgrep, which is a multi-crate project within the same workspace.
Btw, I use some copy-pasted and adapted Hypothesis code in my experiments repo; I keep the license header, but not completely sure if it is alright to do - let me know if it is not ok, then I'll remove it :)
Seems like we're very complementary, @Stranger6667 because I was busy looking into arbitrary
and friends! I think it's very doable to support the arbitrary
crate, but I'm not sure it's going to be very useful in the short term due to the underlying libfuzzer being mostly implemented in cpp. QuickCheck's arbitrary
requires a type-defined shrinker, so maybe we could look into using that down the road. proptest
is basically hypothesis, and so it's Arbitrary
is essentially our Possibility
, or what it will be down the road.
I like the idea of implementing the pareto front! Maybe we should tackle things in parallel?
I'm down with the PR for MT. I'll get to it later today.
@Rik-de-Kort Nice! Support for Arbitrary
is definitely optional, but as I understand it there's been a reasonable amount of work to generate things via those traits and leveraging it is probably a good way to make it easier for early adopters.
@Stranger6667 also very nice!
charmap
, just be careful about which version of Unicode is in use - different Python versions use different Unicode versions, and that means that the set of available characters and the behaviour of those characters is not always stable. If the Rust side supports several we can specify via unicodedata.unidata_version
though.@Rik-de-Kort
I like the idea of implementing the pareto front! Maybe we should tackle things in parallel?
Agree! I have some prototypes for intervalset.py
& charmap.py
, but probably we can think about some kind of roadmap to coordinate all the efforts.
@Zac-HD
Thanks for the feedback! I am definitely inclined to build some significant blocks first, before considering further steps - the DB part was quite isolated from everything else :) For the code organization part, I don't know yet either, but I'll keep the code isolated so it is easier to move around in the future :)
For charmap, just be careful about which version of Unicode is in use
Yep, I added versions 9.0 - 13.0, but it is trivial to add more since there is already a script for that :)
Using Hypothesis code is fine; you just need to stick to the MPL licence (it's basically file-level copyleft)
Cool! Good to know that :)
If we could speak about long-running plans on the Rust implementation, then I think it would be nice to discuss the expectations.
Generally, there are a lot of aspects regarding maintainability, CI, new tooling, bindings, etc that need to be discussed at some point, but probably not now.
Most of these decisions, especially about longer-term plans, will need to be approved specifically by @DRMacIver. It sounds to me like we're on a good path though :smiley:
If someone could clarify, why is minithesis-rust preferable to to using the existing conjecture-rust? Does minithesis implement newer versions of hypothesis algorithms or something like that?
The reason I ask is that I'm primarily interested in reviving hypothesis-ruby, and I was playing around with simply replacing the current Ruby bindings to conjecture-rust with Rutie which is actively maintained. I have some basic examples working through the Rust binding layer (any(integers)
and any(arrays(of: integers)
are partially working) in a branch here.
I'd love to start using hypothesis-ruby and am willing to help out with any endeavors there. But if conjecture-rust is itself deprecated, I would stop trying to replace the ruby binding layer to it and help out with other areas.
Hey @amw-zero! As I understand it updating the bindings would be sufficient to get hypothesis-ruby working again and we'd be delighted to accept a PR which does so :grin:
On the Python side, we've simply accumulated a lot more fiddly backend to implement before we could switch over from conjecture-python to conjecture-rust, and so implementing smaller parts gives us a smoother migration pathway. I don't think wants to review a 20,000-line diff to the core of the library, but incremental steps are both managable and useful!
IIRC conjecture-rust is also more adapted to the Ruby frontend than a direct port of conjecture-python, so we'd have to work out a much cleaner cross-language interface layer. That's definitely going to happen at some point, but I think it makes sense to e.g. sort out our build system and CI, and move some slow data structures into Rust before porting the Python backend entirely.
Ok cool! I didn't want to hijack the thread but my question is also related to the status of conjecture-rust. It sounds like what I'm working on is unrelated to the overall move towards Rust, though maybe one day all of the platforms could share a common core. Agreed that that would need to happy incrementally though.
I would call it related - a common core is definitely the goal, and incremental steps the plan. The best way to work out what that common core should be is probably to implement the parts we need for each though, and then refactor + unify them later.
Closing this, as conjecture-rust
is now alive, and well, we have CI jobs running for code formatting, linting & tests :)
We'd be happy to see issues & PRs for features and improvements for conjecture-rust
:)
Thank you, everybody! :)
Hi there, I was looking to do some rust hacking, and being a long time fan of hypothesis, I thought it might be fun to work on conjecture-rust. However, the last commit was two years ago and all the plans seem from around that time as well. Is conjecture-rust still slated to replace conjecture-python?