Status of conjecture-rust?

Rik-de-Kort commented 4 years ago

Hi there, I was looking to do some rust hacking, and being a long time fan of hypothesis, I thought it might be fun to work on conjecture-rust. However, the last commit was two years ago and all the plans seem from around that time as well. Is conjecture-rust still slated to replace conjecture-python?

Zac-HD commented 4 years ago

Per https://github.com/HypothesisWorks/hypothesis/blob/master/README.rst#hypothesis-for-other-languages

[Hypothesis for Ruby] worked pretty well, but used a core Rust implementation that is unfortunately not compatible with recent versions of Rust, due to its dependency on Helix (which now seems to be mostly unmaintained). As a result it is currently unsupported pending a rewrite of the bridging code between Rust and Ruby. We don't at present have the time or funding for this project, but it is likely not a massive undertaking if anyone would like to provide either of these.

If you're interested in helping to bring this back to life, awesome! I'd suggest starting with a port of https://github.com/DRMacIver/minithesis to Rust, and then we can look at the bridging code to Ruby or Python. (and perhaps using it as a proptest backend in Rust itself?)

Rik-de-Kort commented 4 years ago

Thanks for the response, I will get going on it!

Stranger6667 commented 4 years ago

Hi! @Rik-de-Kort let me know if we can join forces :) I was thinking about working on this area for a long time :)

Rik-de-Kort commented 4 years ago

Hi @Stranger6667 for now I'm still getting my head around the core of minithesis (time is a bit limited). I would like to implement the basic basic functionality (test harness, generation, and shrinking) first and then we can start hacking together if that's alright for you?

Stranger6667 commented 4 years ago

Yep, that's great :)

Rik-de-Kort commented 4 years ago

Some basic stuff up at https://github.com/Rik-de-Kort/minithesis-rust/

Will get to working on shrinking tomorrow or tuesday. :) Reviews and comments always welcome.

Rik-de-Kort commented 4 years ago

I think the main ideas are present now. @Stranger6667 would you mind having a look? I think you're a bit more used to programming Rust.

My main gripe with the current implementation is that self.consider exists at all: I don't think it should be necessary. Additionally, it too closely mimics the python implementation which has a lot of try-catch and that's not Rusty™

Stranger6667 commented 4 years ago

@Rik-de-Kort

Awesome! I will take a look today/tomorrow :)

Rik-de-Kort commented 3 years ago

Hi @Zac-HD there's a lot of stuff in the repo currently (here: https://github.com/Rik-de-Kort/minithesis-rust). Would you mind having a look? We're still missing some top-level stuff (namely a test-runner-decorator-thing that can also tell you if something is unsatisfiable), but inner functionality is mostly complete. We did shift some API's around, mostly to let errors bubble up using Result rather than using exception based control flow. Would be happy to hear your comments!

Zac-HD commented 3 years ago

Happy to! Last week was packed, but I have some more time now :smile: Initial thoughts:

It looks like the database maps each key to a single value, but Hypothesis has a set of values for each key (which may be empty). We also annotate this as bytes -> Iterable[bytes] because we only guarantee that you can iterate over the values (not do set ops), but they're informally expected to be unique.
For struct TestCase, we make targeting_score a potentially-empty map of labels to numeric scores. This enables multi-objective optimisation, though I realise that supporting various numeric types takes more work in Rust than Python.
There are a bunch of traits which it would be nice to interoperate with - the arbitrary crate (fuzzing), proptest::arbitrary::Arbitrary, quickcheck::Arbitrary, etc. Per here making types the only API is (we think) a mistake in a PBT interface, but as the existence of st.from_type() suggests it's a useful option if it can be integrated into a more customised strategy.
The possibilities and generation logic look fine. The reduction/shrinking logic also looks fine, though there's obviously a fair distance between minithesis and the full Hypothesis backend :wink:

If the goal is to replace the conjecture backend with a Rust version, the next steps are probably to work out how to call Rust code from Python, and then implementing e.g. the ConjectureData class. That would be a big project, since it touches so many things, but on the other hand it would also be usable pretty much immediately. Or you might want to try translating a more isolated part first, e.g. the Pareto front logic is self-contained and performance-sensitive.

Stranger6667 commented 3 years ago

I don't have much to comment about "targeting" / "arbitrary" parts (I need to dive deeper into these areas), but there are some thoughts about other relevant things.

Database. This implementation comes from a more complete version. I adapted it to match the minithesis interface, but the original one passes all non-Python-specific tests from the Hypothesis test suite on the Rust side and on the Python binding side as well. I also tried to run the Hypothesis test suite with replaced DB classes. Most of the tests passed, except ones that fail due to the ExampleDatabase subclassing check and some introspection issues (checking the data attribute in tests). Otherwise, it is a drop-in replacement that for the directory-based version is 1.5x-4x faster than the current implementation. The in-memory version is slower in some operations and faster in others. I assume it happens mostly because on that scale (nanoseconds), converting from Rust-side structures to Python ones takes enough time to be visible.

Python bindings. Here are bindings implemented with PyO3 for that database part. There is a bit of unsafe, which I didn't find out how to implement otherwise to have a proper iterator protocol :( Generally, there are some corner cases with PyO3, which imply some performance overhead. E.g., it doesn't support generics or lifetimes (other than 'static) in structs, but it can probably be worked around with some manual (and potentially unsafe) reference handling. Otherwise, it supports Python 3.6-3.9 and Linux/macOS/Windows. Also, I have an article on implementing Python bindings for Rust crates with much more technical details which may be relevant to implementing this kind of interop.

Or you might want to try translating a more isolated part first, e.g. the Pareto front logic is self-contained and performance-sensitive.

To me, it sounds like a good way to go - I found that the charmap implementation is quite isolated as well. The nice part is that there is already a crate that generates static tables with Unicode intervals data, so even with an incomplete implementation, it is order-of-magnitude faster than the current implementation (if not taking caching into account). Having static Unicode interval tables on the Python side will give a similar performance, though.

Minithesis. Probably we can submit a PR to add this port to the ports list? There are also a few public API ideas that could be implemented (the builder pattern and a proc-macro), but it seems like the core parts are implemented, and those API ideas are not critical to have. How implementing the conjecture backend would look like in terms of code organization? Separate crate within minithesis or maybe separate repo? I could suggest something like in ripgrep, which is a multi-crate project within the same workspace.

Btw, I use some copy-pasted and adapted Hypothesis code in my experiments repo; I keep the license header, but not completely sure if it is alright to do - let me know if it is not ok, then I'll remove it :)

Rik-de-Kort commented 3 years ago

Seems like we're very complementary, @Stranger6667 because I was busy looking into arbitrary and friends! I think it's very doable to support the arbitrary crate, but I'm not sure it's going to be very useful in the short term due to the underlying libfuzzer being mostly implemented in cpp. QuickCheck's arbitrary requires a type-defined shrinker, so maybe we could look into using that down the road. proptest is basically hypothesis, and so it's Arbitrary is essentially our Possibility, or what it will be down the road.

I like the idea of implementing the pareto front! Maybe we should tackle things in parallel?

I'm down with the PR for MT. I'll get to it later today.

Zac-HD commented 3 years ago

@Rik-de-Kort Nice! Support for Arbitrary is definitely optional, but as I understand it there's been a reasonable amount of work to generate things via those traits and leveraging it is probably a good way to make it easier for early adopters.

@Stranger6667 also very nice!

I don't think it's worth adding native code to Hypothesis just for the database, but once we're adding it for other things we might as well include it.
If or when Hypothesis-for-Python is using Rust we'll want it all to live in this repo, though probably organised as separate libraries. I don't have a good sense for how that should work yet, sorry!
For charmap, just be careful about which version of Unicode is in use - different Python versions use different Unicode versions, and that means that the set of available characters and the behaviour of those characters is not always stable. If the Rust side supports several we can specify via unicodedata.unidata_version though.
Using Hypothesis code is fine; you just need to stick to the MPL licence (it's basically file-level copyleft)

Stranger6667 commented 3 years ago

@Rik-de-Kort

I like the idea of implementing the pareto front! Maybe we should tackle things in parallel?

Agree! I have some prototypes for intervalset.py & charmap.py, but probably we can think about some kind of roadmap to coordinate all the efforts.

@Zac-HD

Thanks for the feedback! I am definitely inclined to build some significant blocks first, before considering further steps - the DB part was quite isolated from everything else :) For the code organization part, I don't know yet either, but I'll keep the code isolated so it is easier to move around in the future :)

For charmap, just be careful about which version of Unicode is in use

Yep, I added versions 9.0 - 13.0, but it is trivial to add more since there is already a script for that :)

Using Hypothesis code is fine; you just need to stick to the MPL licence (it's basically file-level copyleft)

Cool! Good to know that :)

If we could speak about long-running plans on the Rust implementation, then I think it would be nice to discuss the expectations.

What are the decision points whether some code will eventually be a part of Hypothesis or not? E.g. what conditions should be met to have some code considered good enough to be integrated?
Having what parts would be enough to start the integration?

Generally, there are a lot of aspects regarding maintainability, CI, new tooling, bindings, etc that need to be discussed at some point, but probably not now.

Zac-HD commented 3 years ago

Most of these decisions, especially about longer-term plans, will need to be approved specifically by @DRMacIver. It sounds to me like we're on a good path though :smiley:

amw-zero commented 3 years ago

If someone could clarify, why is minithesis-rust preferable to to using the existing conjecture-rust? Does minithesis implement newer versions of hypothesis algorithms or something like that?

The reason I ask is that I'm primarily interested in reviving hypothesis-ruby, and I was playing around with simply replacing the current Ruby bindings to conjecture-rust with Rutie which is actively maintained. I have some basic examples working through the Rust binding layer (any(integers) and any(arrays(of: integers) are partially working) in a branch here.

I'd love to start using hypothesis-ruby and am willing to help out with any endeavors there. But if conjecture-rust is itself deprecated, I would stop trying to replace the ruby binding layer to it and help out with other areas.

Zac-HD commented 3 years ago

Hey @amw-zero! As I understand it updating the bindings would be sufficient to get hypothesis-ruby working again and we'd be delighted to accept a PR which does so :grin:

On the Python side, we've simply accumulated a lot more fiddly backend to implement before we could switch over from conjecture-python to conjecture-rust, and so implementing smaller parts gives us a smoother migration pathway. I don't think wants to review a 20,000-line diff to the core of the library, but incremental steps are both managable and useful!

IIRC conjecture-rust is also more adapted to the Ruby frontend than a direct port of conjecture-python, so we'd have to work out a much cleaner cross-language interface layer. That's definitely going to happen at some point, but I think it makes sense to e.g. sort out our build system and CI, and move some slow data structures into Rust before porting the Python backend entirely.

amw-zero commented 3 years ago

Ok cool! I didn't want to hijack the thread but my question is also related to the status of conjecture-rust. It sounds like what I'm working on is unrelated to the overall move towards Rust, though maybe one day all of the platforms could share a common core. Agreed that that would need to happy incrementally though.

Zac-HD commented 3 years ago

I would call it related - a common core is definitely the goal, and incremental steps the plan. The best way to work out what that common core should be is probably to implement the parts we need for each though, and then refactor + unify them later.

Stranger6667 commented 3 years ago

Closing this, as conjecture-rust is now alive, and well, we have CI jobs running for code formatting, linting & tests :)

We'd be happy to see issues & PRs for features and improvements for conjecture-rust :)

Thank you, everybody! :)

HypothesisWorks / hypothesis

Status of conjecture-rust? #2632