Open scottyhq opened 1 year ago
It does has an effect, but mostly seen when using low level I/O primitives, and not that much with gdalinfo that will try to probe side-car files even if the initial directory listing is disable.
Perhaps this could be rephrased as ?
Compare without list_dir=no, which attemps to do a GET to the directory of the file
$ CPL_CURL_VERBOSE=YES python -c "from osgeo import gdal; f = gdal.VSIFOpenL('/vsicurl?pc_url_signing=yes&url=https://landsateuwest.blob.core.windows.net/landsat-c2/level-2/standard/oli-tirs/2021/045/031/LC08_L2SP_045031_20210107_20210307_02_T1/LC08_L2SP_045031_20210107_20210307_02_T1_ST_B10.TIF', 'rb')"
* Couldn't find host landsateuwest.blob.core.windows.net in the .netrc file; using defaults
* Trying 20.150.76.4:443...
* TCP_NODELAY set
* Connected to landsateuwest.blob.core.windows.net (20.150.76.4) port 443 (#0)
* found 376 certificates in /etc/ssl/certs
* ALPN, offering h2
* ALPN, offering http/1.1
* SSL connection using TLS1.2 / ECDHE_RSA_AES_256_GCM_SHA384
* server certificate verification OK
* server certificate status verification SKIPPED
* common name: *.blob.core.windows.net (matched)
* server certificate expiration date OK
* server certificate activation date OK
* certificate public key: RSA
* certificate version: #3
* subject: CN=*.blob.core.windows.net
* start date: Sun, 25 Dec 2022 02:12:54 GMT
* expire date: Mon, 25 Dec 2023 02:12:54 GMT
* issuer: C=US,O=Microsoft Corporation,CN=Microsoft RSA TLS CA 02
* ALPN, server did not agree to a protocol
> GET /landsat-c2/level-2/standard/oli-tirs/2021/045/031/LC08_L2SP_045031_20210107_20210307_02_T1/ HTTP/1.1
Host: landsateuwest.blob.core.windows.net
User-Agent: GDAL/3.7.0
Accept: */*
* Mark bundle as not supporting multiuse
< HTTP/1.1 404 The specified resource does not exist.
< Content-Length: 223
< Content-Type: application/xml
< Server: Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0
< x-ms-request-id: 1db1b053-101e-0022-178b-381e13000000
< x-ms-version: 2014-02-14
< Access-Control-Expose-Headers: x-ms-request-id,Server,x-ms-version,Content-Length,Date,Transfer-Encoding
< Access-Control-Allow-Origin: *
< Date: Sat, 04 Feb 2023 11:22:58 GMT
<
[....]
with list_dir=no where the file is directly accessed (actually the URL signing stuff)
$ CPL_CURL_VERBOSE=YES python -c "from osgeo import gdal; f = gdal.VSIFOpenL('/vsicurl?pc_url_signing=yes&list_dir=no&url=https://landsateuwest.blob.core.windows.net/landsat-c2/level-2/standard/oli-tirs/2021/045/031/LC08_L2SP_045031_20210107_20210307_02_T1/LC08_L2SP_045031_20210107_20210307_02_T1_ST_B10.TIF', 'rb')"
* Couldn't find host planetarycomputer.microsoft.com in the .netrc file; using defaults
* Trying 2620:1ec:4f:1::42:443...
* TCP_NODELAY set
* Connected to planetarycomputer.microsoft.com (2620:1ec:4f:1::42) port 443 (#0)
* found 376 certificates in /etc/ssl/certs
* ALPN, offering h2
* ALPN, offering http/1.1
* SSL connection using TLS1.2 / ECDHE_RSA_AES_128_GCM_SHA256
* server certificate verification OK
* server certificate status verification SKIPPED
* common name: planetarycomputer.microsoft.com (matched)
* server certificate expiration date OK
* server certificate activation date OK
* certificate public key: RSA
* certificate version: #3
* subject: C=US,ST=Washington,L=Redmond,O=Microsoft Corporation,CN=planetarycomputer.microsoft.com
* start date: Wed, 31 Aug 2022 00:00:00 GMT
* expire date: Wed, 30 Aug 2023 23:59:59 GMT
* issuer: C=US,O=DigiCert Inc,CN=DigiCert TLS RSA SHA256 2020 CA1
* ALPN, server accepted to use h2
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x271da90)
> GET /api/sas/v1/sign?href=https://landsateuwest.blob.core.windows.net/landsat-c2/level-2/standard/oli-tirs/2021/045/031/LC08_L2SP_045031_20210107_20210307_02_T1/LC08_L2SP_045031_20210107_20210307_02_T1_ST_B10.TIF HTTP/2
Host: planetarycomputer.microsoft.com
user-agent: GDAL/3.7.0
accept: */*
accept-encoding: gzip
* Connection state changed (MAX_CONCURRENT_STREAMS == 128)!
< HTTP/2 200
< date: Sat, 04 Feb 2023 11:24:25 GMT
< content-type: application/json
< content-length: 531
< strict-transport-security: max-age=15724800; includeSubDomains
< request-context: appId=cid-v1:75161b1b-6883-4b66-9410-715040c44427
< x-azure-ref: 20230204T112425Z-zdmpvd196551r6z8qen6retaf400000001q0000000001t6v
< x-cache: CONFIG_NOCACHE
< accept-ranges: bytes
[...]
Seeing this, if pc_url_signing=yes is set, we should actually likely automatically disable directory listing as it can't work
setting list_dir=no does not prevent higher level logic in GDAL drivers to probe for individual side-car files
Thanks for the clarification @rouault!
if pc_url_signing=yes is set, we should actually likely automatically disable directory listing as it can't work.
Makes sense to me, for what it's worth the Planetary Computer JupyterHub automatically sets GDAL_DISABLE_READDIR_ON_OPEN=EMPTY_DIR
.
That said, is there a reason not to reuse the list_dir key and add the additional value option for empty_dir list_dir=yes|no|empty_dir
? Just from the docs it's not clear if all of these URL modifiers have corresponding environment variables and override them. Happy to submit a PR to clarify the wording if that is helpful.
That said, is there a reason not to reuse the list_dir key and add the additional value option for empty_dir
list_dir=yes|no|empty_dir
?
well, the GDAL_DISABLE_READDIR_ON_OPEN=YES/NO/EMPTY_DIR naming is quite hard to comprehend (double negations, non-boolean value EMPTY_DIR put in something where a boolean is expected from the DISABLE), so the list_dir=yes/no & empty_dir=yes/no split was an (apparently bad) attempt at making things easier to comprehend.
Happy to submit a PR to clarify the wording if that is helpful.
welcome
Expected behavior and actual behavior.
https://gdal.org/user/virtual_file_systems.html#vsicurl-http-https-ftp-files-random-access
Describes the option to not list directories https://github.com/OSGeo/gdal/blob/dfc719107e07c8e157cbcbba00c0676668b685a3/doc/source/user/virtual_file_systems.rst?plain=1#L239
But looking at log output
list_dir=no
doesn't do anything and insteadempty_dir=yes
has the intended affect:Steps to reproduce the problem.
CPL_DEBUG=ON gdalinfo '/vsicurl?pc_url_signing=yes&list_dir=no&url=https://landsateuwest.blob.core.windows.net/landsat-c2/level-2/standard/oli-tirs/2021/045/031/LC08_L2SP_045031_20210107_20210307_02_T1/LC08_L2SP_045031_20210107_20210307_02_T1_ST_B10.TIF'
Operating system
OSX
GDAL version and provenance