Open kylebarron opened 4 days ago
I'm curious how python handles the lifetimes present in the futures and if it possible to do something similar for streams?
Currently all the futures have a non-static lifetime, but your comments indicate it is only the streams where this causes issue?
To the extent I understand it's okay with futures because they're evaluated immediately, either asynchronously via https://github.com/PyO3/pyo3-async-runtimes or synchronously via a tokio runtime. It matters whether the result of the future can be materialized as data and presented to Python or whether it should be preserved as a stream.
With the stream returned from ObjectStore::get
, I'm storing the BoxStream
itself in a struct exported to Python https://github.com/developmentseed/object-store-rs/blob/c157fe2cb8da14c1e8eee297014bb4695d683cc3/object-store-rs/src/get.rs#L111-L115. Then when the Python async iteration calls __aiter__
, that creates a new future based on polling the stream.
But I think this is only possible because BoxStream
is 'static
. For another lifetime, you'd have to annotate the struct containing the stream with that lifetime, which pyo3 immediately rejects.
Ok so if I follow, this would mean that in order for tokio to poll the stream, the python async code would need to invoke it in a timely manner? Or to put it differently, if the python code is busy doing something else, the tokio work would get starved. I wonder if this is going to lead to the same sorts of issues we've run into multiplexing CPU bound tasks on the same threadpool? Perhaps there needs to be some sort of buffering between the two regardless, which would obviate any lifetime shenanigans? The same issue would also potentially apply to regular futures
TBC I am not opposed to changing the signature, just trying to ensure the python bindings work as well as possible 😅
I'm not fully clear on all the details on how the pyo3-async-runtimes integration works. From its readme:
in pyo3-asyncio, we decided the best way to handle Rust/Python interop was to just surrender the main thread to Python and run Rust's event loops in the background.
And I know there's revamped async work going on in https://github.com/PyO3/pyo3/issues/1632 and https://github.com/wyfo/pyo3-async, which is suppose to remove some overhead of integrating rust and python async.
To answer your questions:
Ok so if I follow, this would mean that in order for tokio to poll the stream, the python async code would need to invoke it in a timely manner? Or to put it differently, if the python code is busy doing something else, the tokio work would get starved.
I think so, yes. At least in my current implementation of this, if __anext__
is never called, then the tokio future's next()
is never called. So it relies on Python to drive this async.
I wonder if this is going to lead to the same sorts of issues we've run into multiplexing CPU bound tasks on the same threadpool?
Potentially, but I'd say this is up to the user to ensure the Python async runtime doesn't get blocked, and that they're not running CPU bound tasks on the main thread.
Perhaps there needs to be some sort of buffering between the two regardless, which would obviate any lifetime shenanigans?
I'm not sure how to implement this 😅 , or else I'd try to implement it now.
just trying to ensure the python bindings work as well as possible
Trying to think of other things on my wish list to mention for the Python bindings...
One thing that would be great but I certainly don't expect to change is if object_store used something like an arrow::buffer::Buffer
, which permits externally-allocated memory. Right now in a put
we always have to make a copy from Python-backed memory to Rust-backed memory first. Or is there some way to use external memory with a bytes::Bytes
?
Or is there some way to use external memory with a bytes::Bytes?
There have been various discussions about exposing the bytes vtable externally, but they've not gotten anywhere AFAICT. This would really be my preferred path, it seems unfortunate to break from the ecosystem with our own wrapper
it seems unfortunate to break from the ecosystem with our own wrapper
Absolutely agree, even though it's unfortunate for my use case.
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
I'm writing a new Python binding for the
object-store
crate (differences from the existing one detailed here).I'm trying to present streaming APIs to the user instead of materializing an entire result stream upfront. This worked for
get
, where we can expose thestream
returned byGetResult::into_stream
as a Python async iterable.When I tried to do present a streaming result API for
list
, I tried to cast the result ofObjectStore::list
toand got that the
store
does not live long enough.It's not possible to use a lifetime other than
'static
in a struct exported to Python, so I'm not sure if it's possible to wraplist
as a stream currently.@tustvold mentioned in discord that it may be possible to change other methods to use
'static
in the next major object-store release.Describe the solution you'd like
The most straightforward solution would be to change the
ObjectStore
trait to use the'static
lifetime. But I'm open to other solutions. I don't fully understand async lifetimes well enough to know all the implications here.Describe alternatives you've considered
Additional context