OpenGeoscience / geonotebook

A Jupyter notebook extension for geospatial visualization and analysis
Apache License 2.0
1.08k stars 141 forks source link

Allow TMS-served tiles to be displayed on the map #135

Closed jpolchlo closed 6 years ago

jpolchlo commented 6 years ago

This PR makes it possible to display map layers that are furnished by a remote service with a TMS-like web interface. With this change, any data-handling back end that implements a simple protocol may display map data in the GeoNotebook web interface.

[Note: What we are calling "TMS" may not adhere strictly to the definition of a TMS server. We'd like to name this interface in a way that is consistent with the community's conventions, such that we don't create confusion further down the line. We'll take recommendations regarding the proper nomenclature to use.]

The interface for the TMS server assumes a standard power-of-two pyramid established over the global extent using a WebMercator projection. At zoom level n, there are 2^n partitions of the extent along each axis, indexed from 0 to 2^n-1, with the cartesian product of the two axes' partitions giving the space of (x, y) keys for which a tile may exist. [Note: At the moment, we index using the Google/Bing/OSM convention where (0, 0) corresponds to the extreme northwest tile. If necessary, one could accommodate a flag to indicate that the tiles are being served in accordance with the TMS standard of an origin in the extreme southwest.]

To add a TMS-provided layer, one needs to provide an object that adheres to the following simple interface:

class TMS:
    def bind(self, host, port=None):
        # binds a service to a host at a specified port (random port if port=None)
        # subsequent calls to bind should fail until unbind is called
    def unbind(self):
        # stops the tile server from listening for further requests
        # subsequent calls to bind to a new host/port may restart the service
    @property
    def host(self):
        # the name of the host listening for tile requests
    @property
    def port(self):
        # returns the port number the service is listening to requests on
    @property
    def url_pattern(self):
        # returns a string template of the URL tile requests can be made on
        # the special tokens {z}, {x}, and {y} should be used as placeholders 
        # for the zoom level and x and y coordinates, respectively

Objects that are duck-typed with this interface (provide these methods/properties) may be wrapped in a TMSRasterData (from geonotebook.wrappers) and passed to M.add_layer as per usual. The TMS server will be bound to a port at the time of creation; removing a TMS-backed layer will unbind the service. [Note: I have not implemented special handling of the standard styling options (e.g., opacity) and have no expectation that they will have any effect. A subsequent PR could address this.]

kotfic commented 6 years ago

@jpolchlo Thanks! I'll start looking at this today/tomorrow

jpolchlo commented 6 years ago

As far as testing is concerned, we have been using locationtech-labs/geopyspark to produce TMS servers. We can work together to build a test notebook if you need.

aashish24 commented 6 years ago

@jpolchlo this is great and thank you for the contribution.

kotfic commented 6 years ago

@jpolchlo still trying to wrap my head around this a little do you have an example of the kind of object that TMSRasterData is supposed to wrap? Specifically it seems like the primary motivation for the bind/unbind functionality is handling randomly assigned port values?

If you could give me a better sense of what goes on inside the unbind function that TMSRasterData calls I think it would help me understand a little better. If i'm not mistaken needing to call unbind() is what motivates the introduction of disgorge and the _server_state variable passing through kernel.py/layer.py?

jpolchlo commented 6 years ago

Yes, disgorge and _server_state are there to support clean shutdown of the TMS servers—not specifically to handle the random port assignment, though. Conceivably, the vis server could pass through a port assignment. It also seemed like a reasonable way to introduce state to the vis servers, in case that is useful in the future. I half expect that there's something I didn't consider in the system design that makes this dangerous.

The TMS object that I'm working with defines unbind() as in https://github.com/locationtech-labs/geopyspark/blob/master/geopyspark/geotrellis/tms.py#L176-L184, which calls out to https://github.com/locationtech-labs/geopyspark/blob/master/geopyspark-backend/geotrellis/src/main/scala/geopyspark/geotrellis/tms/Server.scala#L44-L48 (which in turn calls to the Scala code in that directory). It is entirely intended to shut down the server, free ports, and generally clean up after itself. In this case, there are processes in the background doing request aggregation so that we make fewer calls out to Spark that need to be terminated.

jpolchlo commented 6 years ago

This is a great idea; I hadn't noticed the option of creating a new reader type, but it's fairly clear that it's the right thing to do. It may also resolve some of the questions below regarding bind and the TMS API. If we were to use RasterData('http://localhost:<port>/.../{z}/{x}/{y}.png') (though the argument for a tms schema is readily apparent), then it removes the responsibility from the reader class to start up the TMS endpoint, as it obviously needs to exist before that call can be made. In this case, using GeoPySpark constructs, adding a TMS layer will take the following form

tms = TMS.build(...)
tms.bind()
M.add_layer(RasterData(tms.url_pattern), display_option_1, display_option_2)

while removing it will just require M.remove_layer to be followed by tms.unbind().

The problem with such a solution is that TMS objects can be lost easily in this manner, and the ports continue to run backed by data with references that can never be released. It's easy to suggest that the user needs to take ultimate responsibility, but in an interactive environment with rapid iteration, it should be expected that, unless it's rolled into the reader, these objects will be mishandled. That's the motivation for the ingest/disgorge pairing managing the bind/unbind process. So, for sake of ease of use, and in the interest of reducing the bookkeeping burden from the user, figuring out how to make RasterData(tms) into valid syntax is worthwhile. If we must rely on a string argument so that the schema can be extracted and routed through the entry point mechanism, then some significant ease of use will be lost. It already seems as if VRTs get some special treatment inside the Ktile vis server, so figuring out how special display objects can generalize would be worthwhile. I'd love to get your thoughts on this.

I think in either case, documentation of the reader interface would be of use.

And, yes, it seems that if I move over to using the appropriate reader structure, then we will remove the need for state to be carried, and that change can be reverted.