Open dazza-codes opened 3 years ago
@dazza-codes As far as I know there was some work made in ~2010 described in https://gdal.org/development/rfc/rfc24_progressive_data_support.html which was made for Jpeg2000/ecw format.
This was made specifically for some drivers and not for VSI*. I'm not even sure libcurl natively supports AsyncIO operation
libcurl has no asynchronous interface. You can do that yourself either by using threads or by using the non-blocking "multi interface" that libcurl offers. Read up on the multi interface here:
ref: https://curl.se/mail/lib-2002-05/0090.html
you may have to use something like https://github.com/jbaldwin/liblifthttp
gdal should support asynchronous HTTP (and HTTP2 protocols) with asyncio patterns
what would be the use case ? Most use of /vsicurl/ and similar network filesystems are currently done through other GDAL API which are at 99% blocking. As @vincentsarago pointed, there is a async raster API but it is only marginally used
If GDAL API is blocking, async doesn't really add any value. Also I don't think the libcurl multi interface you reference is "truly async", it's more like client side multiplexing which still requires something waiting for the request to finish. In fact the idea of "truly async" programming languages is much newer than CURL so this isn't surprising to me.
A common way to implement "truly async" with CURL is with callbacks, as they allow the execution of code as requests finish, but I'm not familiar enough with GDAL to know if using callbacks would be problematic (usually it is ex. javascript callback hell).
Suggestion: gdal might try to find and use an async lib and/or use C++11 std::async
wrappers to support /aio*
(to supplement /vsi*
) for services that require an async client for HTTP/S, e.g. some related commentary in:
Suggestion to manage the experimental feature development (ignore if this is nonsense). The functionality could aim to provide additional features with no changes whatsoever to the existing libcurl and /vsi*
functionality. Although I don't fully understand the intent of the /vsi*
"namespace", perhaps a new "namespace" like /aio*
could add experimental functionality to support async patterns, with exposure in python-swig to support asyncio. (Unfortunately I don't have time, nor know enough to begin a PR draft.)
With regard to use-cases, if it is not obvious already, e.g.:
asyncio.Semaphore
and asyncio.sleep
await
(AFAIK)I agree in an ideal world GDAL would have good support for async loading, but I'd guess it could be a considerable amount of work and might not happen without some dedicated funding.
Note that since you specifically reference Python and GeoTIFFs, you might want to follow the development of https://github.com/geospatial-jeff/aiocogeo
Via other channels, I just bumped into https://github.com/geospatial-jeff/aiocogeo - it supports asyncio patterns
gdal should support asynchronous HTTP (and HTTP2 protocols) with asyncio patterns. For example, all /vsis3 reads are synchronous, with no support for asyncio patterns to await an s3 read. If the asynchronous patterns are not supported by libcurl, use a different dependency to support them or a wrapper to support them.
e.g. https://gist.github.com/owickstrom/3218376 e.g. https://stackoverflow.com/questions/11980311/libcurl-writecallback-async-c/11980430
(Apologies if gdal already supports asyncio patterns, happy to be corrected and pointed in the right direction. I don't use py-gdal directly, only rasterio wrappers on gdal.)