ajnisbet / opentopodata

Open alternative to the Google Elevation API!
https://www.opentopodata.org
MIT License
314 stars 68 forks source link

GCS Support #7

Closed intUnderflow closed 3 years ago

intUnderflow commented 4 years ago

It would be nice if we could mount data from Google Cloud Storage and Amazon S3, especially when using opentopodata as a container.

Right now to use opentopodata as a docker container, we either have to mount the data inside of a Kubernetes deployment (which means we need to keep the data on Persistent Disks) or submit opentopodata with the data already inside to be built into a docker container. This can really slow things down (especially deploying new nodes and the build process itself) when there's a lot of data.

With GCS / S3 support, we could simply tell opentopodata which GCS or S3 bucket contains our data, build a container with that configuration and then deploy it.

There's alternatives like gcsfuse, but they rely on permissions not normally given to docker containers and definitely not available on cloud providers.

ajnisbet commented 4 years ago

Hey cool idea, this would also make it easy to point opentopodata at the mapzen tiles on AWS!

A proper solution would look something like this:

I think it's doable, I'll think about it some more, it might impact some future features I have planned like building a spatial index of rasters and inferring raster tile size.


For now, it's possible to get close to this using a VRT.

I have files test-srtm90m-subset/N00E010.hgt and test-srtm90m-subset/N00E010.hgt in the GCS bucket www-opentopodata-org-public. Ideally you'd convert rasters to cloud optimised geotiffs.

You can build a vrt with

gdalbuildvrt data/gcloud/dataset.vrt /vsigs/www-opentopodata-org-public/test-srtm90m-subset/N00E010.hgt /vsigs/www-opentopodata-org-public/test-srtm90m-subset/N00E011.hgt.zip

which is an xml file that opentopodata/gdal understands, listing rasters and their bounds

<VRTDataset rasterXSize="2401" rasterYSize="1201">
  <SRS>GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]]</SRS>
  <GeoTransform>  9.9995833333333337e+00,  8.3333333333333339e-04,  0.0000000000000000e+00,  1.0004166666666667e+00,  0.0000000000000000e+00, -8.3333333333333339e-04</GeoTransform>
  <VRTRasterBand dataType="Int16" band="1">
    <NoDataValue>-32768</NoDataValue>
    <ComplexSource>
      <SourceFilename relativeToVRT="0">/vsigs/www-opentopodata-org-public/test-srtm90m-subset/N00E010.hgt</SourceFilename>
      <SourceBand>1</SourceBand>
      <SourceProperties RasterXSize="1201" RasterYSize="1201" DataType="Int16" BlockXSize="1201" BlockYSize="1" />
      <SrcRect xOff="0" yOff="0" xSize="1201" ySize="1201" />
      <DstRect xOff="0" yOff="0" xSize="1201" ySize="1201" />
      <NODATA>-32768</NODATA>
    </ComplexSource>
    <ComplexSource>
      <SourceFilename relativeToVRT="0">/vsigs/www-opentopodata-org-public/test-srtm90m-subset/N00E011.hgt.zip</SourceFilename>
      <SourceBand>1</SourceBand>
      <SourceProperties RasterXSize="1201" RasterYSize="1201" DataType="Int16" BlockXSize="1201" BlockYSize="1" />
      <SrcRect xOff="0" yOff="0" xSize="1201" ySize="1201" />
      <DstRect xOff="1200" yOff="0" xSize="1201" ySize="1201" />
      <NODATA>-32768</NODATA>
    </ComplexSource>
  </VRTRasterBand>
</VRTDataset>

With this config

datasets:
- name: gcloud
  path: data/gcloud/

and launching docker with access credentials

docker run -it -v /home/XXX/opentopodata/data:/app/data:ro -p 5000:5000 -e GS_SECRET_ACCESS_KEY=XXX -e GS_ACCESS_KEY_ID=XXX -e CURL_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt opentopodata:1.2.1

then http://localhost:5000/v1/gcloud?locations=0.5,11.5 works just fine.

I think the only disadvantage you have with the VRT (other than that it's an almighty hassle) is that it has to do a bunch of bounds checks to find the correct file which will be slow if you have lots of rasters, but probably still negligible compared to reading a tif over http.

ajnisbet commented 3 years ago

Closing for now, I documented some ways to hack around this here, native support is on the roadmap.

Feel free to reopen with any issues not covered by the the above hacks; that would make cloud storage support a higher priority.