Closed jpswinski closed 1 year ago
parms = {
'poly': [
{'lon': -156.6430455278934, 'lat': 71.11303515926326},
{'lon': -156.26446195120343, 'lat': 71.27727860723829},
{'lon': -156.7080728245955, 'lat': 71.33780162227296},
{'lon': -156.98688849401648, 'lat': 71.21627954209416},
{'lon': -156.6430455278934, 'lat': 71.11303515926326}],
't0': '2023-01-01T00:00:00Z',
't1': '2024-01-01T00:00:00Z',
'samples': {'mosaic': {'asset': 'arcticdem-mosaic', 'algorithm': 'NearestNeighbour'}}
}
gf = icesat2.atl06p(parms, version='006')
After some GDAL log sleuthing today with @elidwa and @dshean we noticed the PGC VRTs changed on Aug 9! And I'm fairly certain those changes are the root cause of the sliderule errors:
CPL_DEBUG=ON \
GDAL_DISABLE_READDIR_ON_OPEN=EMPTY_DIR \
AWS_NO_SIGN_REQUEST=YES \
CPL_CURL_VERBOSE=ON \
time \
gdallocationinfo -wgs84 /vsis3/pgc-opendata-dems/arcticdem/mosaics/v3.0/2m/2m_dem_tiles.vrt -150 70
Relevant bits of log below:
CURL_INFO_HEADER_IN: Last-Modified: Wed, 09 Aug 2023 16:54:52 GMT
CURL_INFO_HEADER_OUT: GET /arcticdem/mosaics/v3.0/2m/2m_dem_tiles.vrt HTTP/1.1
Host: pgc-opendata-dems.s3.amazonaws.com
User-Agent: GDAL/3.7.0
Accept: */*
Range: bytes=16384-4934014
CURL_INFO_HEADER_OUT: GET /arcticdem/mosaics/v3.0/2m/46_19/46_19_2_2_2m_v3.0_reg_dem.tif HTTP/1.1
CURL_INFO_HEADER_IN: Accept-Ranges: bytes
CURL_INFO_HEADER_IN: Content-Type: image/tiff
CURL_INFO_HEADER_IN: Server: AmazonS3
CURL_INFO_HEADER_IN: Content-Length: 2031645595
# Report:
# Location: (943312P,1766861L)
# Band 1:
# <LocationInfo></LocationInfo>
# Value: 116.615013122559
# 1.63user 2.77system 1:52.05elapsed
This is a 2GB 'Content-Length' transfer! Not a Range request. So it takes over a minute to read 1 pixel from a COG!
The problem is that the entire TIF is downloaded rather than doing a byte range request. I think this is because the new VRTs do not specify /vsicurl/
as a prefix to each TIF. If we make that change, getting a pixel value is ~0.5 seconds:
wget https://pgc-opendata-dems.s3.amazonaws.com/arcticdem/mosaics/v3.0/2m/2m_dem_tiles.vrt
sed 's,https:,/vsicurl/https:,g' 2m_dem_tiles.vrt > 2m_dem_tiles_v3.vrt
time gdallocationinfo -wgs84 2m_dem_tiles_v3.vrt -150 70
# Value: 116.615013122559
#real 0m0.595s
#user 0m0.185s
#sys 0m0.039s
Also noting the same problem with the new ArcticDEM version 4.1
(https://www.pgc.umn.edu/data/arcticdem/).
(https://pgc-opendata-dems.s3.amazonaws.com/arcticdem/mosaics/v4.1/2m_dem_tiles.vrt)
I created a branch arcticdem_mosaics_v4.1 which has the changes to code and tests to run with latest arcticdem mosaics v4.1 Unfortunately some tests cannot complete due to the box running out of virtual memory. In particular tests for calculating zonal stats or resampling POI over some area using different algos cause the loss of connectivity to remote server. This kind of makes sense with what Scott found out.
The vrt for v3.0 before PGC changed had relative tif paths:
"
while after updates it now has
"
GDAL vsis3 driver which uses curl cannot do it's byte range magic as Scott pointed out.
The ArcticDEM vrts have been updated by PGC to use absolute paths with /vsis3/. This has resolved the issue.
Attempting to sample the mosaic raster has stopped working.
From the production servers, we see this error message:
From a local run of the server code, the system core dumped: