Open drwelby opened 1 week ago
I think there's significant potential in an async COG reader for Python. But personally I think the greatest potential would be to implement this in Rust with bindings to Python.
Rust has proven its potential as a language that's stable, easy to maintain, easy to bind to Python, and really fast.
Look at our project obstore
. It's a Rust-powered Python library to interact with AWS/GCS/Azure from Python (GET/HEAD/LIST/DELETE, etc). It's useful in its own right, but also useful as a benchmarking tool to see how fast the async Rust-Python integration can be. In https://github.com/geospatial-jeff/pyasyncio-benchmark @geospatial-jeff is working on benchmarks for obstore and his early results indicate that it may enable significantly higher throughput than aioboto3.
This is consistent with the initial results that Earthmover found in Icechunk, that Zarr v3 with Icechunk as the IO layer can be 2x faster than Zarr v3. The mechanism for this is likely that Icechunk is also using async Rust as the IO layer.
A Rust COG reader could potentially be even faster by decoding image data on a separate thread from the coroutine, so we could stack improvements in the IO and CPU layers.
I spent some time prototyping a Rust port of aiocogeo
in https://github.com/developmentseed/aiocogeo-rs. I was able to fully read COG metadata but didn't yet implement data reads, so there's no benchmark yet. There's a separate geotiff
Rust crate, but that builds on the tiff
crate, which doesn't have async support. There's some ongoing discussion on this here (https://github.com/georust/geotiff/issues/13).
I'd love to push aiocogeo-rs forward to at least get some benchmarks and see if it's worth continuing development, but I'm working on a lot of projects and it's hard to dedicate time to it without funding.
I'm looking forward to hearing others' thoughts as well!
Thanks Kyle, there's some very exciting potential here. Let me catch up on all your links. Maybe the best route here is that we all leapfrog ahead and support the Rust work where we can.
For @geospatial-jeff but also @vincentsarago, @dmahr1, @kylebarron and other interested parties.
Are any of you still interested in working on this project? Async COG reading is quite useful to us Maxar folks so if we get more active in its use and upkeep do any of you still want to do the top-level management of steering the project, accepting PRs, and such? We can certainly work in our own fork, but if there's interest in "handing this off", I'd be happy to discuss.