OSGeo / gdal

GDAL is an open source MIT licensed translator library for raster and vector geospatial data formats.
https://gdal.org
Other
4.92k stars 2.56k forks source link

/vsicurl/: fix to allow to read Parquet partitionned datasets from public Azure container using /vsicurl/ #11310

Open rouault opened 1 day ago

rouault commented 1 day ago

Fixes #11309

mdsumner commented 1 day ago

I don't know if this is related ... but I tried this fix branch because I don't think partitioned parquet worked for me before, and so why does this /vsis3 form work

ogrinfo --config AWS_S3_ENDPOINT  projects.pawsey.org.au --config AWS_VIRTUAL_HOSTING NO --config AWS_NO_SIGN_REQUEST YES PARQUET:/vsis3/vzarr/oisst-avhrr-v02r01.parquet

INFO: Open of `PARQUET:/vsis3/vzarr/oisst-avhrr-v02r01.parquet'
      using driver `Parquet' successful.
1: oisst-avhrr-v02r01 (None)

but not the /vsicurl form ?

ogrinfo PARQUET:/vsicurl/https://projects.pawsey.org.au/vzarr/oisst-avhrr-v02r01.parquet

(is it settings on the bucket for raw-url use perhaps?)

rouault commented 1 day ago

but not the /vsicurl form ?

yes same reason. But in the case of https://projects.pawsey.org.au/vzarr/oisst-avhrr-v02r01.parquet, there's nothing in the HTTP response headers that indicates it is a AWS directory (just a hint that it is a non-existent resource under a AWS bucket)...

$ curl -v -X HEAD https://projects.pawsey.org.au/vzarr/oisst-avhrr-v02r01.parquet
[....]
> HEAD /vzarr/oisst-avhrr-v02r01.parquet HTTP/1.1
> Host: projects.pawsey.org.au
> User-Agent: curl/7.68.0
> Accept: */*
> 
[....]
< HTTP/1.1 404 Not Found
< content-length: 218
< x-amz-request-id: tx000008893db01a3d16a91-00673dfb58-7d05beb-default
< accept-ranges: bytes
< content-type: application/xml
< date: Wed, 20 Nov 2024 15:08:08 GMT
[....]