Open davidhewitt opened 4 years ago
Hitting #1088 when I'm trying to expose crfs::Model which has a lifetime parameter to Python. It would be great if this code works when data
is a Python managed buffer.
#[pyclass]
struct Model<'a> {
data: &'a [u8], // or maybe &'a PyBytes
model: crfs::Model<'a>,
}
Last time I looked at this I concluded that having the lifetime parameter on the #[pyclass]
is impossible, however something along this kind of line could work:
#[pyclass]
struct PyModel {
data: PyBytes
accessor: Box<dyn for<'py> Fn(Python<'py>) -> crfs::Model<'py>>
}
Rather than Box<dyn Fn>
it would be nice to have a trait, but I think it might require generic associated types.
I haven't thought too hard in this space. I'm sure that something can be possible but it just needs research!
That looks a lot like what ouroboros does for self-referential struct.
Looks interesting 👀
Maybe we can extend #[new]
to be enabled to take Python<'py>
.
I was able to get something working with ouroboros using self-referential struct to remove the lifetime parameter: https://github.com/messense/crfs-rs/pull/3
That's really cool! I think we might be able to write an iterator example using that 🚀 (#1085)
So, the natural extension of this would be if, instead of copying to a Vec<u8>
, we could simply do a borrow from a Py<PyBytes>
. I believe this should be safe for the same reasons.
The hard bit is how to correctly constrain this to python-owned data. For example, &[u8]
isn't enough; if we enabled something like the below I can easily write unsound code:
#[pyclass]
struct WithBorrowedData<'py> {
py: Python<'py>,
data: &'py [u8],
}
// trivial to create a function which invalidates lifetimes
#[pyfunction]
fn use_after_free(py: Python) -> PyResult<PyObject> {
let data = vec![];
Py::new(WithBorrowedData { py, data: data.as_slice() }.into()
}
// The return value of use_after_free contains an invalid data reference
I've been thinking that PyRef
(or something similar) would need to be used as a wrapper. Maybe a PyRef::map
function would be useful.
Tbh this potentially also ties in with #1308 and how we represent owned / borrowed python data in general.
Yes, I think the desired behavior is something like:
#[ouroboros::self_referencing]
struct Foo {
owner: Py<PyBytes>,
#[borrows(owner)]
slice: &'this [u8]
}
#[pyfunction]
fn f(data: Py<PyBytes>) {
Foo::new(data, |data| data.as_bytes())
}
Effectively the same pattern as https://github.com/pyca/cryptography/blob/main/src/rust/src/ocsp.rs#L32-L42, except without copying to a Vec
.
This should be safe because if you have a Py<PyBytes>
you own a reference to it, and PyBytes are immutable, so it will never be freed behind your back.
If I understand correctly, reason that the above doesn't currently work with #[ourobouros::self_referencing]
is because the slice access closure doesn't have access to a Python
GIL token? It seems like that should be solvable by providing an equivalent of ourobouros::self_referencing
inside PyO3, but that is probably a lot of work...
Yeah, the ideal thing is probably finding a way for them to play nicely together. I'm not sure how complicated that is.
On Sun, Jun 27, 2021 at 5:04 PM David Hewitt @.***> wrote:
If I understand correctly, reason that the above doesn't currently work with #[ourobouros::self_referencing] is because the slice access closure doesn't have access to a Python GIL token? It seems like that should be solvable by providing an equivalent of ourobouros::self_referencing inside PyO3, but that is probably a lot of work...
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/PyO3/pyo3/issues/1089#issuecomment-869223389, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAAGBAYPDQ65FYSC3JKMSTTU6G4RANCNFSM4PXQQD5A .
-- All that is necessary for evil to succeed is for good people to do nothing.
This is my attempt at allowing a PyClass
to hold a reference to another PyClass
.
I think it is safe, but not sure if it has other issues.
https://gist.github.com/b05902132/de18debe9039473aa9b0bed6b48436c8
As an example, I wrote a simple class backed by Vec
and its iterator. Hope this helps.
👍 thanks, that's very interesting and roughly along the lines of what I was thinking. It would be nice to be able to handle that mapped value as a real Python object too. That'll provide a way for us to eventually avoid cloning in #1358
(This will probably require lots of changes to the internal of PyCell
.)
Hitting this again in https://github.com/messense/tree-sitter-py/blob/05d5adde312afd22c8ac9f59f07731e17cf9a71a/src/lib.rs#L116-L122 but this time ouroboros
can't help. The error message when uncommented is
error[E0495]: cannot infer an appropriate lifetime for autoref due to conflicting requirements
--> src/lib.rs:118:18
|
118 | self.borrow_cursor().node()
| ^^^^^^^^^^^^^
|
note: first, the lifetime cannot outlive the anonymous lifetime defined on the method body at 116:13...
--> src/lib.rs:116:13
|
116 | fn node(&self) -> PyNode {
| ^^^^^
note: ...so that reference does not outlive borrowed content
--> src/lib.rs:118:13
|
118 | self.borrow_cursor().node()
| ^^^^
note: but, the lifetime must be valid for the anonymous lifetime #1 defined on the body at 117:49...
--> src/lib.rs:117:49
|
117 | PyNode::new(self.borrow_tree().clone(), |tree| {
| _________________________________________________^
118 | | self.borrow_cursor().node()
119 | | })
| |_________^
note: ...so that the expression is assignable
--> src/lib.rs:118:13
|
118 | self.borrow_cursor().node()
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^
= note: expected `Node<'_>`
found `Node<'_>`
Basically tree-sitter has aTree
struct that can produce a Node<'_>
and a TreeCursor<'_>
, then TreeCursor<'_>
can also produce a Node<'_>
, it use PhantomData
to enforce lifetime variance.
Note to self: when looking at the internals of #[pyclass]
to make this possible, also think about use of Box
/ Arc
and how is might impact adding conversion implementations for them, xref #3014 #3729
FYI other Mercurial developers and I added this feature (or at least something similar) in rust-cpython
back in 2019: http://dgrunwald.github.io/rust-cpython/doc/cpython/struct.PySharedRefCell.html
I don't know if something similar can be achieved, but I figured you'd like to know. I would certainly need this feature to actually use PyO3 for non-trivial classes within the context of Mercurial.
Fwiw the thing I described above, borrowing from a PyByted works, and is what cryptography uses these Days
From the linked docs, it seems that using this requires unsafe code? If so, one can already use unsafe code to replace lifetimes by 'static
. I think this issue is about providing a safe interface to achieve this.
We have some unsafe related to other parts of our ouroborous stuff, but just having a strict that borrows a PyByted is safe
On Mon, Feb 12, 2024, 12:15 PM Adam Reichold @.***> wrote:
From the linked docs, it seems that using this requires unsafe code? If so, one can already use unsafe code to replace lifetimes by 'static. I think this issue is about providing a safe interface to achieve this.
— Reply to this email directly, view it on GitHub https://github.com/PyO3/pyo3/issues/1089#issuecomment-1939179165, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAAGBCBMARJUTQ6YBUCVKLYTJE2FAVCNFSM4PXQQD5KU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJTHEYTOOJRGY2Q . You are receiving this because you commented.Message ID: @.***>
Sorry, this was us racing GitHub comments. I was referring to PySharedRefCell
apparently requiring unsafe
.
At the moment, as #1088 notes (and also #1085), it's difficult to expose things like iterators to Python because of the restriction that
#[pyclass]
cannot take a lifetime.This restriction obviously makes sense because Rust-managed data cannot be borrowed and then exposed to Python; Python will allow this data to live arbitrarily long.
However, data that's already owned by Python, e.g. in a
PyCell<T>
, can be safely "borrowed" and stored in another#[pyclass]
. We do this already, but only for the lifetime-free casePy<T>
.It would be incredibly powerful if
PyRef<'a, T>
(or some similar construct) was able to be stored inside another#[pyclass]
. This would make make wrapping arbitrary iterators, for example, much easier.This playground shows that with some carefully-placed transmutes it's possible to hack the type system to at least achieve the desired affect: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=041f9a6f5dfa3a5fc59ece7105cc4dfd
That code is a wildly unsafe sketch that doesn't actually use the PyO3 types, so please don't try to use it as-is. You're responsible for all segfaults if you do 😄. If you're finding yourself in need of this feature, I can offer mentoring to help design how PyO3 could make such a trick safe and ergonomic.