`#[pyclass]` trait ergonomics

PyO3 / pyo3

Rust bindings for the Python interpreter

https://pyo3.rs

Apache License 2.0

12.06k stars 744 forks source link

`#[pyclass]` trait ergonomics #4207

Open davidhewitt opened 4 months ago

davidhewitt commented 4 months ago

Following up to #4202 and #4206 I wanted to write down something I've been thinking about for a while. It looks like we're moving ahead with accepting those options so I'll assume that at least the design choice to have these options is accepted.

In summary, I think there are at least five convenience options to add to #[pyclass] to automatically expose Python magic methods based on Rust trait implementations:

[ ] #[pyclass(eq)] for equality based off PartialEq - #4210
[ ] #[pyclass(ord)] for ordering operators based off PartialOrd - #4202
[ ] #[pyclass(hash)] for hash based off Hash - #4206
[ ] #[pyclass(str)] for __str__ based off Display. I think we could also have #[pyclass(str = "<format>")] to specify a format-args-compatible string concisely.
[ ] #[pyclass(repr)] for __repr__ based off Debug. Similarly to the above I think we could have #[pyclass(repr = "<format>")].

Based off the recent PRs I think the basic implementation of these is more or less agreed upon, possibly excluding the extra argument on the string formatting options.

As we proceed with / make progress on the above, I think there are two further steps which make sense to explore:

Simple enums currently generate eq and repr automatically. I think we should be deprecating the automatic generation and start requiring opt-in to the new behaviour, to be consistent with everything else. I think it's relatively easy to do this as we just need to emit the deprecation warning if the eq or repr arguments weren't passed.
I think a future extension would be to have #[pyclass(init)] to automatically generate a constructor (like we already do for complex enums) and then #[pyclass(dataclass)] which does some or all of the above conveniences (maybe depending on whether the class is frozen).

Zyell commented 4 months ago

@davidhewitt I would like to start exploring the str task. I think if we land eq first, I can rebase the work I did in ord to integrate it with what @Icxolu did for eq. I can migrate in the tests and doc updates and add the extra match arms pretty easily at that point. I don't think this should have any impact on the str work, but please correct me if I missed something. You did make a comment reagarding naming the methods. I take it we should adhere to this for all of these features? If there is anything else I should look out for let me know! Thanks!

Zyell commented 4 months ago

@davidhewitt You mentioned passing formatting arguments to str and repr. Is there a benefit to that vs having the user directly implement Display and Debug traits where they would have full formatting control?

davidhewitt commented 4 months ago

@Zyell sounds great, please do feel welcome to take on str!

You did make a comment reagarding naming the methods. I take it we should adhere to this for all of these features? If there is anything else I should look out for let me know! Thanks!

Yes, I think we can handle that to make the duplication check for all of these functions. I think #4210 now has working logic to do that.

You mentioned passing formatting arguments to str and repr. Is there a benefit to that vs having the user directly implement Display and Debug traits where they would have full formatting control?

I think by default we should use the traits for convenience, but if the format string is passed use that instead.

I think it's common to want the text formatting to be a bit different between Python and Rust. I think this is particularly true for __repr__ where I think users may want MyType(arg1=<thing>, arg2=<thing>) syntax in Python and MyType { arg1: <thing>, arg2: <thing> } syntax in Rust.

#[pyclass(str)]  // uses Display
struct Foo {} 

#[pyclass(str = "hello {name}")]  // uses custom formatting string
struct Person {
    name: &str
}

As for the variables permitted in these custom formatting strings, I think prior art would be in how something like thiserror handles them.

Zyell commented 4 months ago

@davidhewitt That makes sense. Thank you for the pointer to the thiserror library, that approach looks really nice.

Zyell commented 3 months ago

@davidhewitt I have finished the __str__ implementation assuming you are happy with the direction. If so, I would like to work on __repr__. I setup the __str__ implementation to have the formatting be generic enough to apply to the __repr__ case. I was planning on adding a deprecation warning to the current __repr__ implementation on simple enums in favor of the explicit repr argument. Otherwise, the implementation will be effectively the same as the __str__ implementation, but will default to using the Debug trait implementation. Let me know if I should proceed in the current direction. Thanks!

davidhewitt commented 3 months ago

Thanks @Zyell, the __str__ implementation is looking great!

Given what we're discussing in that review, I'm having a bit of self-doubt whether it's really useful at all for Debug to be a default for __repr__.

I'm now wondering whether if we had #[pyclass(repr)], would we be better off auto-generating a custom implementation which matches what @dataclass might do in Python?

e.g. something like the below

#[pyclass(repr)]
struct Point {
   x: i32,
   y: i32,
   #[pyo3(repr = false)]  // skip this field in the repr
   metadata: Metadata
}

might generate a repr Point(x=1, y=2).

While it would be a more complex implementation, I'd suspect this is what users probably want in most cases? I guess the problem is that we would still have to use Rust's Debug trait for each field, which wouldn't be nice if we for e.g. a PyObject field where the Debug isn't very nice and users would probably expect the Python __repr__ to be used for that too. What about Vec<PyObject>? This might be solvable to some extent automatically, however it'd probably need an element of user customisation.

... all in all, it feels like there are design possibilities with repr which I perhaps too naively ignored above. As usual, I suppose there are two steps:

What's the "ideal" design?
What could we implement as a first step which is compatible with that?

... or is this too complex and we should back out and leave users to write __repr__ themselves?

What I'm now wondering is, does this have implications for #[pyclass(str)] using Display and #[pyclass(str = "format")]? I think probably not, given that ord, eq, hash all use traits and work nicely.

Zyell commented 3 months ago

@davidhewitt I like the idea of being able to disable fields from the output! I use that all the time in dataclasses.

I still like the idea of tying this to the Debug trait for reasons I mentioned in my last comment on the str implementation. However, the PyObject case or the vector of PyObjects might be problematic. I think it is reasonable for the user to expect it to use its __repr__ implementation as well. I have not played with these cases before to see how to best handle them. I think that this case would need to be handled for str as well. Could this be nested arbitrarily deep?

I think the ideal design would be something like this:

Match Debug implementation as a default
Handle PyObject cases by calling their __repr__ implementations (this might have consequences for the above case though...).
Allow for fields to be removed from the repr.

I will explore the PyObject case you pointed and see what we might do about it before starting an implementation or repr so we have a better idea about what makes sense to do. Thanks for the insight!

Zyell commented 2 months ago

@davidhewitt I took a closer look at repr and it is looking rather complex to implement. I can navigate the AST just fine in the pyclass macro for the fields to implement a custom format. However, for nested objects, this quickly gets complex. There is no type information about the fields other than its named type (as far as I understand it this is a limitation from within this kind of macro). I could parse that looking for patterns like Py present in the a Vec or HashMap, etc., but would need to implement cases explicitly. But if anything uses the newtype pattern, there doesn't seem to be a way to get at the actual type. If the fields refer to other structs or enums, it doesn't seem possible to know if those fields are structs or enums to know how to format them generically (all that seems accessible is the name of the struct or enum as the type). In the following case, we couldn't determine what Label is to know how to format it, the type in the parsed AST is just "Label":

#[derive(PartialEq, Debug)]
struct Label {
    a: u32
}

#[pyclass(repr)]
#[derive(PartialEq, Debug)]
struct Point4 {
    name: String,
    msg: String,
    idn: u32,
    label: Label,
}

The Formatter on the Debug trait implementation gets around this by delegating the formatting to all fields, requiring their types to implement Debug all the way down the chain.

I had hoped that the Formatter might take arguments such that we could override some of the separators like changing field: value to field=value and changing some { into (. This would match the python dataclass repr representation in nearly all cases. Unfortunately, these are hardcoded within the Formatter.

Assuming we could get the necessary type information to traverse these fields, we could delegate most of the formatting to the Debug implementation for collections and primitives. For PyObject types, we could call the repr implementation through Python (seems a little wonky, but I did test that it would work, but maybe there is a better way?).

Without that information though, I'm not sure we could generically and automatically implement a repr that matches the Python dataclass implementation.

I'm happy to explore this further if you have some additional insights and ideas!