geopandas / geopandas

Python tools for geographic data
http://geopandas.org/
BSD 3-Clause "New" or "Revised" License
4.34k stars 907 forks source link

ENH: stabilize function `_read_file` for endpoints that redirects with different headers #3311

Open mattijn opened 1 month ago

mattijn commented 1 month ago

Fix #3284.

This pull request addresses the problem where redirects from jsdelivr lack necessary headers, causing unexpected failures. The issue is hard to reproduce.

Ideally, jsdelivr provides the needed headers upon redirecting, but this PR also improves the code by separating the handling of fiona and pyogrio as backends, so it might still be useful.

Next to formatting, this PR introduces a new function named _url_supports_random_access(). It checks if the Accept-Ranges header is "bytes". Tries a HEAD request first, falls back to a GET request if necessary (eg. some endpoints have no HEAD method enabled). Uses a partial data access if Accept-Ranges is supported (currently only for fiona) and retrieves the entire file using urllib if Accept-Ranges is not supported or if partial data access through fiona somehow still fails (as is the case in #3284).

Btw, I think GDAL fails due to the missing Content-Length element in the header and not on a missing Accept-Ranges in the header.

[Fri May 17 09:41:57 2024].9412, 400.5738: VSICURL: GetFileSize(https://cdn.jsdelivr.net/npm/vega-datasets@v1.29.0/data/earthquakes.jso1): response_code=404, server error msg=HTTP/2 404 

But as far I have observed, if there is no Accept-Ranges defined then there is no Content-Length defined.

martinfleis commented 1 month ago

Not sure if this actually does the trick. See the log https://github.com/geopandas/geopandas/actions/runs/9323915991/job/25668099398#step:5:5174

jorisvandenbossche commented 4 weeks ago

Uses a partial data access if Accept-Ranges is supported (currently only for fiona)

As I mentioned on the issue (https://github.com/geopandas/geopandas/issues/3284#issuecomment-2131268003), my understanding is that this should just as much work for pyogrio as it does for fiona (or, to phrase it differently, I don't understand why there would be a difference, since it is GDAL that is eventually failing)