Closed intUnderflow closed 3 years ago
Hey cool idea, this would also make it easy to point opentopodata at the mapzen tiles on AWS!
A proper solution would look something like this:
path: gs://www-opentopodata-org-public/test-srtm90m-subset/
.glob
for finding raster files in the dataset folder, detect a cloud path prefix and do the file lookup with something like boto or rclone./vsigs/www-opentopodata-org-public/test-srtm90m-subset/N00E010.hgt
format, which rasterio/gdal can read (/vsis3/
for s3, /vsiaz/
for azure).I think it's doable, I'll think about it some more, it might impact some future features I have planned like building a spatial index of rasters and inferring raster tile size.
For now, it's possible to get close to this using a VRT.
I have files test-srtm90m-subset/N00E010.hgt
and test-srtm90m-subset/N00E010.hgt
in the GCS bucket www-opentopodata-org-public
. Ideally you'd convert rasters to cloud optimised geotiffs.
You can build a vrt with
gdalbuildvrt data/gcloud/dataset.vrt /vsigs/www-opentopodata-org-public/test-srtm90m-subset/N00E010.hgt /vsigs/www-opentopodata-org-public/test-srtm90m-subset/N00E011.hgt.zip
which is an xml file that opentopodata/gdal understands, listing rasters and their bounds
<VRTDataset rasterXSize="2401" rasterYSize="1201">
<SRS>GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]]</SRS>
<GeoTransform> 9.9995833333333337e+00, 8.3333333333333339e-04, 0.0000000000000000e+00, 1.0004166666666667e+00, 0.0000000000000000e+00, -8.3333333333333339e-04</GeoTransform>
<VRTRasterBand dataType="Int16" band="1">
<NoDataValue>-32768</NoDataValue>
<ComplexSource>
<SourceFilename relativeToVRT="0">/vsigs/www-opentopodata-org-public/test-srtm90m-subset/N00E010.hgt</SourceFilename>
<SourceBand>1</SourceBand>
<SourceProperties RasterXSize="1201" RasterYSize="1201" DataType="Int16" BlockXSize="1201" BlockYSize="1" />
<SrcRect xOff="0" yOff="0" xSize="1201" ySize="1201" />
<DstRect xOff="0" yOff="0" xSize="1201" ySize="1201" />
<NODATA>-32768</NODATA>
</ComplexSource>
<ComplexSource>
<SourceFilename relativeToVRT="0">/vsigs/www-opentopodata-org-public/test-srtm90m-subset/N00E011.hgt.zip</SourceFilename>
<SourceBand>1</SourceBand>
<SourceProperties RasterXSize="1201" RasterYSize="1201" DataType="Int16" BlockXSize="1201" BlockYSize="1" />
<SrcRect xOff="0" yOff="0" xSize="1201" ySize="1201" />
<DstRect xOff="1200" yOff="0" xSize="1201" ySize="1201" />
<NODATA>-32768</NODATA>
</ComplexSource>
</VRTRasterBand>
</VRTDataset>
With this config
datasets:
- name: gcloud
path: data/gcloud/
and launching docker with access credentials
docker run -it -v /home/XXX/opentopodata/data:/app/data:ro -p 5000:5000 -e GS_SECRET_ACCESS_KEY=XXX -e GS_ACCESS_KEY_ID=XXX -e CURL_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt opentopodata:1.2.1
then http://localhost:5000/v1/gcloud?locations=0.5,11.5
works just fine.
I think the only disadvantage you have with the VRT (other than that it's an almighty hassle) is that it has to do a bunch of bounds checks to find the correct file which will be slow if you have lots of rasters, but probably still negligible compared to reading a tif over http.
It would be nice if we could mount data from Google Cloud Storage and Amazon S3, especially when using opentopodata as a container.
Right now to use opentopodata as a docker container, we either have to mount the data inside of a Kubernetes deployment (which means we need to keep the data on Persistent Disks) or submit opentopodata with the data already inside to be built into a docker container. This can really slow things down (especially deploying new nodes and the build process itself) when there's a lot of data.
With GCS / S3 support, we could simply tell opentopodata which GCS or S3 bucket contains our data, build a container with that configuration and then deploy it.
There's alternatives like
gcsfuse
, but they rely on permissions not normally given to docker containers and definitely not available on cloud providers.