OSGeo / gdal

GDAL is an open source MIT licensed translator library for raster and vector geospatial data formats.
https://gdal.org
Other
4.9k stars 2.55k forks source link

OGR fails to open a VRT layer with a remote data source if VSI_CACHE=TRUE #3006

Closed agiudiceandrea closed 4 years ago

agiudiceandrea commented 4 years ago

Searching for the source of a QGIS issue https://github.com/qgis/QGIS/issues/35016, after testing various QGIS and GDAL version on Windows and Ubuntu Linux, I found that OGR fails to open a VRT layer file, with a remote data source (a csv file) hosted on data.world server, if VSI_CACHE is set to TRUE.

A test VRT file is available at https://raw.githubusercontent.com/agiudiceandrea/TEST_CSV/master/covid19-regioni_dw.vrt

On both Windows (7 / 10 with GDAL/OGR 3.0.4 - 3.2.0dev) and Ubuntu Linux ( 18.04 / 20.04 with GDAL/OGR 2.2.3 - 3.0.4)

if VSI_CACHE is not set or set to FALSE

ogrinfo covid19-regioni_dw.vrt jgsghstpphjhicstradhy5kpjwrnfy -summary

outputs

INFO: Open of `covid19-regioni_dw.vrt'
      using driver `OGR_VRT' successful.
Warning 6: Unknown type : data
...
Layer name: jgsghstpphjhicstradhy5kpjwrnfy
Geometry: Point
Feature Count: 30069
Extent: (0.000000, 0.000000) - (18.171897, 46.499335)
Layer SRS WKT:
GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4326]]
Data axis to CRS axis mapping: 2,1
denominazione_regione: String (0.0)

and QGIS can properly display the layer

while if VSI_CACHE is set to TRUE

ogrinfo covid19-regioni_dw.vrt jgsghstpphjhicstradhy5kpjwrnfy -summary

outputs

INFO: Open of `covid19-regioni_dw.vrt'
      using driver `OGR_VRT' successful.
ERROR 1: Failed to open datasource
`CSV:/vsicurl_streaming/https://query.data.world/s/jgsghstpphjhicstradhy5kpjwrnfy'.

Layer name: jgsghstpphjhicstradhy5kpjwrnfy
Geometry: None
Feature Count: 0
Layer SRS WKT:
(unknown)

and QGIS cannot display the layer.

The QGIS issue was spotted on Windows since the OSGeo4W QGIS installer sets by default VSI_CACHE=TRUE and VSI_CACHE_SIZE=1000000: it seems these settings were introduced 7 years ago (https://github.com/qgis/QGIS/commit/9222f152b30948a3a9d9f66dffd162548022f7fd) to workaround a reported OGR performance issue (https://github.com/qgis/QGIS/issues/15688) about shapefile reading over network on Windows.

rouault commented 4 years ago

The issue is specific to this particular server. a HEAD request returns a unreliable Content-Length: 0, after following the redirection:

$ curl -I -L https://query.data.world/s/jgsghstpphjhicstradhy5kpjwrnfy 
HTTP/1.1 301 Moved Permanently
Date: Sat, 03 Oct 2020 21:27:44 GMT
Content-Length: 0
Connection: keep-alive
Server: nginx
Location: https://download.data.world/file_download/ondata/covid-19-italia-dati-dipartimento-protezione-civile/dpc-covid19-ita-province.csv?auth=eyJhbGciOiJIUzUxMiJ9.eyJzdWIiOiJwcm9kLXVzZXItY2xpZW50OnBpZ3JlY29pbmZpbml0byIsImlzcyI6ImFnZW50OnBpZ3JlY29pbmZpbml0bzo6YjNiYjc5YmItODA0NS00ZmY5LWI0NzAtY2ZhMGVhOTA4NDY3IiwiaWF0IjoxNTgzOTU3NjgxLCJyb2xlIjpbInVzZXIiLCJ1c2VyX2FwaV9hZG1pbiIsInVzZXJfYXBpX3JlYWQiLCJ1c2VyX2FwaV93cml0ZSJdLCJnZW5lcmFsLXB1cnBvc2UiOmZhbHNlLCJ1cmwiOiI2YWU4NWIyMTMwZjBiN2QxNjM4ZDY1MGUxMmIzNjc4ODFkMDJjMmY0In0.--2gOMnWY_iAkI21UBShHWmkmea8xwfro5fWQ22VuPd1XBvKJ2DnEtwGulHYM4H2C64eTtfUnjeBO6jxQMVKhQ
Vary: Origin

HTTP/1.1 200 OK
Date: Sat, 03 Oct 2020 21:27:46 GMT
Content-Type: text/csv
Content-Length: 0
Connection: keep-alive
Server: nginx
Content-Disposition: attachment;filename="dpc-covid19-ita-province.csv"
agiudiceandrea commented 4 years ago

Thanks. Successfully tested with the latest build of GDAL 3.1 (d0541ea) from GISInternals (after updating curl-ca-bundle.crt https://github.com/gisinternals/buildsystem/issues/162).