DHI / terracotta

A light-weight, versatile XYZ tile server, built with Flask and Rasterio :earth_africa:
https://terracotta-python.readthedocs.org
MIT License
692 stars 75 forks source link

Adapt to AWS λ #9

Closed dionhaefner closed 6 years ago

mrpgraae commented 6 years ago

I don't think that this should be part of the LBST sprint?

dionhaefner commented 6 years ago

Let's talk about it - I'm still strongly in favor of a λ deployment, even if we do the analysis locally, but I haven't heard all the arguments.

j08lue commented 6 years ago

I am also split on this issue. It would be very convenient not to have to worry about timely data ingestion and preparation because someone else is taking care of this... But we would need to e.g. calculate NDVI on the fly and maybe pay for L2A data for that? Let's discuss this tomorrow.

mrpgraae commented 6 years ago

We will have to do a lot of stuff on-the-fly (cutting Landsat tiles, calculating indices, etc.), also no cloud optimization (might be significant when we have to do a lot of processing) and there's potentially a significant amount of extra development time in this. More cons than pros, as far as I can tell, but maybe I'm not seeing all the pros.

I'm all for making Terracotta run on λ, but I'm not sure that a project with a hard August 1st deadline is the right time to do it. Unless we actually gain something significant?

dionhaefner commented 6 years ago

Why do it on the fly? My idea was to pre-process everything, cloud-optimize it, dump it in a S3 bucket, and let Terracotta serve it.

mrpgraae commented 6 years ago

Then we still need to have a local workflow for all the pre-processing. It would make sense if we could avoid having to deal with infrastructure, but we still need to host a webserver and a geoserver for the static shape-stuff (correct me if I'm wrong @j08lue).

Also, only clients who are authorized on the webserver should be able to query Terracotta. I believe we were going to achieve this by having Terracotta run on the webserver machine, make it unreachable from the outside and then just have the webserver query it. I'm not sure how to accomplish that with lambda, unless we build authentication into Terracotta?

mads-gras commented 6 years ago

We already have a GeoServer for the shapefiles we need - it's running on ncr102. We also have dedicated webserver (called GRASWEB) located on the outside of the DHI firewall. We do not really need too much security in terms of access to datasets on terracotta at this stage. It's fine that everyone that guesses the right URL can see the datasets as it is now. But for the 2017 version of the LBST site, the imagery was also protected and for that it was needed.

mrpgraae commented 6 years ago

Okay. It was my impression at the LBST meeting last week, that access should be for authorized users only. Will we need it at a later stage?

mads-gras commented 6 years ago

access to the site needs to be to authorized users yes, but not to the raster layers as such. And yes, we will likely need this later

dionhaefner commented 6 years ago

I implemented token authentication on the bathymetry app, so we could just steal that. Serving vectors is easy, so my idea for the infrastructure was local analysis, push to S3, Terracotta serves everything. No infrastructure required, zero maintenance, can run forever as long as we pay the AWS bills (i.e., S3 bills - Lambda is free for us).

dionhaefner commented 6 years ago

Doesn't make much sense if we don't let Terracotta serve the vectors, of course.

mrpgraae commented 6 years ago

Yes if we could get rid of the Geoserver and serve everything through Terracotta, Lambda would make a lot of sense, since we get rid of the infrastructure. Then we will need to redo the plan/overview that we made last week @j08lue.

mads-gras commented 6 years ago

For the shapefiles, GeoServer does more than just show the shapes. That's how we get all the attribute data that is shown in the info-panel, and it's also used for the zones/tiles-structure.

It would be nice to get rid of GeoServer, but my take was just that it was simpler to keep that part as is, and focus on where we have to make changes

j08lue commented 6 years ago

The role of GeoServer in the frontend is currently:

  1. Serve out (the respective relevant subset of) 500 MB of fields features
  2. Make clickable field polygons - e.g. click on field changes raster layer underneath showing the best data for that field
  3. Serve out field metadata to populate tables

Is it feasible to replace all that with vector tiles from Terracotta? Can we generate vector tiles that not only show the feature but also include all the attributes? I know very little about this (yet), but we can allocate a few man days for that tops, I'd say.

j08lue commented 6 years ago

I implemented token authentication on the bathymetry app, so we could just steal that.

nice

j08lue commented 6 years ago

my idea for the infrastructure was local analysis, push to S3, Terracotta serves everything. No infrastructure required, zero maintenance, can run forever as long as we pay the AWS bills (i.e., S3 bills - Lambda is free for us).

😻 But we need to get a feel for the feasibility of this.

I think this is really worth investigating, because a Terracotta with those capabilities on AWS would make for a real transferable setup - we could spin up new web apps very easily and cheaply with such a tool.

dionhaefner commented 6 years ago

Let's draft the API tomorrow and figure out what the requirements really are.

mrpgraae commented 6 years ago

I think this is really worth investigating, because a Terracotta with those capabilities on AWS would make for a real transferable setup - we could spin up new web apps very easily and cheaply with such a tool.

I agree, but we really need to figure out if this is the right project to start developing this.

Let's draft the API tomorrow and figure out what the requirements really are.

:+1: