I'm using this issue as a place to record my thoughts about various ways of providing offline background mapping.
The problem
A user will be using Jupyter notebooks to perform analysis on ship data stored in Pepys. They want to be able to display the ship data on a map (probably using Folium as that has excellent integration with geopandas and is easy to plot the ship data in - but it could be a similar library) and contextualise the data by showing some sort of background map. However, this computer will be disconnected from the internet for security reasons, so web-based background maps cannot be used.
Using folium offline
This has already been solved, by creating a module called offline_folium which will download the required JS/CSS files and then alter folium to use the downloaded versions. See here for more details.
Background map data wanted
A variety of background map data could be useful to analysis. This includes:
Coastlines
More detailed vector mapping - mostly land-focused (even land-focused mapping gives useful context for coastal operations)
More detailed vector mapping - water-focused (such as depth contours, political boundaries & names of geographic features)
Raster nautical charts
The areas covered by these data could be small parts of the UK, the whole of the UK, or the whole world.
Options
GeoJSON coastline data
This is the simplest option. Some GeoJSON files could be provided with Pepys which contain coastline data for various regions (Scotland, UK, Europe, World). The analyst decides which of these he wants to display, and adds a line like display_coastlines('europe', m) to his notebook (where m is a reference to the folium Map object that they have created). This uses the built-in folium GeoJSON support, and displays the GeoJSON on the map. It can be styled in various ways (different widths of line, shading of land area etc).
This data can be provided at various levels of detail and corresponding size of data. A quick experiment has shown that the UK coastline can take anything from < 1Mb to > 10Mb of GeoJSON data to store, depending on the level of generalisation. At a low level of detail, the whole world's coastlines can fit in 10Mb of GeoJSON, but this may be too coarse to be useful for analysts (as ships in the Solent, for example, could be shown as being within the Isle of Wight's coarsened coastline).
Advantages:
Really simple: distribute some files and write a wrapper function that uses built-in folium functionality
Note from IanM: I believe client analysts would also be able to generate their own GeoJSON datasets, giving a wide range of capabilities for visualisation (density maps, buffers around objects, etc). I believe folium supports GeoJSON out of the box
Disadvantages:
Behind the scenes, folium embeds the whole of the GeoJSON file into the HTML page that is hosting the Jupyter notebook. Therefore, big files can considerably slow down the rendering of the map and the responsiveness of the notebook in general. (Embedding is required because javascript running in a Jupyter notebook can't access the file-system of the computer for security reasons, so everything has to be read by the Python code and embedded in the HTML). Note from IanM:here is some code to upload data to the Jupyter server. It may require careful management/monitoring, but it may be an alternative to loading the data into html.
The above point could become a big problem if we're trying to deal with the coastlines of the whole world at a reasonable level of detail.
There is no way to provide different levels of detail at different zoom levels: loading a coastline of the UK and viewing the whole country at once will use just as much data, and slow things down just as much, even if you never zoom in to see the detail.
This gives no information for areas of sea, and no context for the land (apart from the shape of the coastline) - giving the analysts relatively little context. Note from IanM: It is possible to get/produce GeoJSON labelled point data for place-names on maps.
Nautical charts
UKHO charts are available to the analysts as GeoTIFF files. These can be displayed as a background layer on the map, and would give analysts a lot of context for the sea areas (plus some for the coastal land areas). There are a couple of ways of doing this:
Loading an individual chart
We could provide a folder of charts on the server that hosts Pepys, and the user could select a specific chart to load (this would require them knowing the name of the chart etc, but they may be used to choosing a chart anyway in software like ArcGIS, or we could provide some sort of lookup to guess a good chart for the area). We would then display it on the folium map.
This would require extending folium to be able to display GeoTIFF files. There is a plugin for Leaflet maps to allow displaying GeoTIFFs (see here), and it would be relatively easy to extend folium to be able to use this. However, the big problem would be that the folium JS code can't access the file-system on the computer - so the maps would have to be hosted somewhere that can serve them over HTTP. Although this sounds like a deal-breaker, a simple HTTP server could be run on the server that hosts the master copy of Pepys, or a local server could be run on each analyst's machine (only when the Jupyter notebook is running - it could be started in a separate process by the Pepys Admin tool). Both of these would just point at the folder containing the GeoTIFF files, and serve them all over HTTP.
For efficient display we would want to convert the GeoTIFFs to Cloud Optimized GeoTIFFs (COGs), but that is very easy to do.
Advantages:
Lots more detail and information for the water areas
Relatively little effort (compared to some other approaches)
Disadvantages:
Requires writing a new folium plugin
Requires running a server - either centrally, or on-demand on analyst's machines (neither of which is a massive deal, as it would literally be one of the HTTP servers that is built into the Python standard library)
Only allows display of a few charts at a time, and each has to be loaded manually. If we provide some nice lookup function from bounding-box of the map to a list of charts to display, then we wouldn't be able to add charts automatically when we scroll outside of that area
Raster data, so although it is beautifully-styled, it won't necessarily resample well for different zoom levels
Loading a mosaic of multiple charts
We could provide a way of serving a tile layer like the standard OpenStreetMap tiles (lots of little square images that together make a map, and that are available at a range of resolutions) but composed of a mosaic of the UKHO charts. This could be done with a tool like TiTiler. This is a slightly more involved server to run - so should probably only be run in a central place on the network - but is all written in Python and is pretty easy to deploy. In simple terms, it takes a list of GeoTIFF files (well, COGs actually - to make it efficient) and produces tiles on the fly at the relevant zoom level, and deals with mosaicing everything together.
As this is a tile layer just like OpenStreetMap, it doesn't require any extensions to folium, as folium can already display tile layers.
Advantanges:
Doesn't require any extensions to folium
Provides the full set of UKHO charts over the whole UK area (or world-wide if they have access to GeoTIFF charts for the whole world)
Provides a lot of detail over the water areas
Disadvantages:
A more complex system to set up
Requires deploying a TiTiler server somewhere on their LAN (though it would be possible to run it on analyst's individual machines - I've run it on a local machine a number of times with no problem). Deployment isn't too hard, but does add extra things we need to do.
Raster data, so although it is beautifully-styled, it won't necessarily resample well for different zoom levels
More detailed vector mapping
Not sure how relevant this is, as most of the vector data we have will be over land. However, it could provide a nicer way of dealing with more detailed vector coastline data (compared to the GeoJSON option). This would involve storing vector data (such as coastlines, but potentially much more - right up to basically all the OpenStreetMap data) in the PostGIS database, and then serving it up as either raster tiles (like OpenStreetMap tiles) or vector tiles (little chunks of vector data, that can be styled on the fly).
Raster tiles
To create raster tiles on the fly we'd need to set up a OpenStreetMap tile server, or similar. I don't have that much experience with this, so I'll leave this there. Note from IanM: I've looked into this a couple of times. The last time I looked, it was an 80Gb download, then load that into a Postgres instance, then (optionally) initialise a tile-cache for likely areas of interest, and lastly run a web-server to serve the tiles. I believe client Tech Support staff could handle this.
Vector tiles
PostGIS can create vector tiles on the fly now (exciting new feature!), and there is a very lightweight server that can go in front of PostGIS called pg_tileserv which can serve them easily across a network. So, we could run this on the same server that runs Postgres, and get folium to connect to it. Unfortunately folium doesn't support vector tiles by default, but there is a Leaflet extension that does - and we could write an extension for folium to work with this Leaflet extension.
Advantages:
Very flexible - can display any vector data the clients want to use
Uses the PostGIS database that we already have set up
Would allow us to store high-resolution coastline data for the whole world in PostGIS and just serve the relevant bits to the client - and those bits wouldn't be included raw into the HTML (unlike the GeoJSON approach)
Allow us to style the data however we want
Doesn't have any raster zoom effects - giving perfectly sharp lines at any resolution
Disadvantages:
Requires running a server
Requires extending folium
Can only display the vector data the client has access to. If they don't have vector data for the water areas then the best they can display is OpenStreetMap-style data for the land
OpenStreetMap data is very detailed and very large, so OSM data for the whole of the UK (or world!) would be very quite large.
Concluding thoughts
This is the result of various bits of research I've done, plus other experience I have (for example, I've been using and deploying TiTiler in my other work at the moment). I don't know which is best, as it will depend very much on the client's requirements. I hope these notes on advantages and disadvantages will be useful - and I hope the notes will help any future developer implement one of these solutions. Happy to discuss further, just wanted to get these thoughts out of my mind and into a semi-permanent place.
I'm using this issue as a place to record my thoughts about various ways of providing offline background mapping.
The problem
A user will be using Jupyter notebooks to perform analysis on ship data stored in Pepys. They want to be able to display the ship data on a map (probably using Folium as that has excellent integration with geopandas and is easy to plot the ship data in - but it could be a similar library) and contextualise the data by showing some sort of background map. However, this computer will be disconnected from the internet for security reasons, so web-based background maps cannot be used.
Using folium offline
This has already been solved, by creating a module called
offline_folium
which will download the required JS/CSS files and then alter folium to use the downloaded versions. See here for more details.Background map data wanted
A variety of background map data could be useful to analysis. This includes:
The areas covered by these data could be small parts of the UK, the whole of the UK, or the whole world.
Options
GeoJSON coastline data
This is the simplest option. Some GeoJSON files could be provided with Pepys which contain coastline data for various regions (Scotland, UK, Europe, World). The analyst decides which of these he wants to display, and adds a line like
display_coastlines('europe', m)
to his notebook (wherem
is a reference to the foliumMap
object that they have created). This uses the built-in folium GeoJSON support, and displays the GeoJSON on the map. It can be styled in various ways (different widths of line, shading of land area etc).This data can be provided at various levels of detail and corresponding size of data. A quick experiment has shown that the UK coastline can take anything from < 1Mb to > 10Mb of GeoJSON data to store, depending on the level of generalisation. At a low level of detail, the whole world's coastlines can fit in 10Mb of GeoJSON, but this may be too coarse to be useful for analysts (as ships in the Solent, for example, could be shown as being within the Isle of Wight's coarsened coastline).
Advantages:
Disadvantages:
Nautical charts
UKHO charts are available to the analysts as GeoTIFF files. These can be displayed as a background layer on the map, and would give analysts a lot of context for the sea areas (plus some for the coastal land areas). There are a couple of ways of doing this:
Loading an individual chart
We could provide a folder of charts on the server that hosts Pepys, and the user could select a specific chart to load (this would require them knowing the name of the chart etc, but they may be used to choosing a chart anyway in software like ArcGIS, or we could provide some sort of lookup to guess a good chart for the area). We would then display it on the folium map.
This would require extending folium to be able to display GeoTIFF files. There is a plugin for Leaflet maps to allow displaying GeoTIFFs (see here), and it would be relatively easy to extend folium to be able to use this. However, the big problem would be that the folium JS code can't access the file-system on the computer - so the maps would have to be hosted somewhere that can serve them over HTTP. Although this sounds like a deal-breaker, a simple HTTP server could be run on the server that hosts the master copy of Pepys, or a local server could be run on each analyst's machine (only when the Jupyter notebook is running - it could be started in a separate process by the Pepys Admin tool). Both of these would just point at the folder containing the GeoTIFF files, and serve them all over HTTP.
For efficient display we would want to convert the GeoTIFFs to Cloud Optimized GeoTIFFs (COGs), but that is very easy to do.
Advantages:
Disadvantages:
Loading a mosaic of multiple charts
We could provide a way of serving a tile layer like the standard OpenStreetMap tiles (lots of little square images that together make a map, and that are available at a range of resolutions) but composed of a mosaic of the UKHO charts. This could be done with a tool like TiTiler. This is a slightly more involved server to run - so should probably only be run in a central place on the network - but is all written in Python and is pretty easy to deploy. In simple terms, it takes a list of GeoTIFF files (well, COGs actually - to make it efficient) and produces tiles on the fly at the relevant zoom level, and deals with mosaicing everything together.
As this is a tile layer just like OpenStreetMap, it doesn't require any extensions to folium, as folium can already display tile layers.
Advantanges:
Disadvantages:
More detailed vector mapping
Not sure how relevant this is, as most of the vector data we have will be over land. However, it could provide a nicer way of dealing with more detailed vector coastline data (compared to the GeoJSON option). This would involve storing vector data (such as coastlines, but potentially much more - right up to basically all the OpenStreetMap data) in the PostGIS database, and then serving it up as either raster tiles (like OpenStreetMap tiles) or vector tiles (little chunks of vector data, that can be styled on the fly).
Raster tiles
To create raster tiles on the fly we'd need to set up a OpenStreetMap tile server, or similar. I don't have that much experience with this, so I'll leave this there. Note from IanM: I've looked into this a couple of times. The last time I looked, it was an 80Gb download, then load that into a Postgres instance, then (optionally) initialise a tile-cache for likely areas of interest, and lastly run a web-server to serve the tiles. I believe client Tech Support staff could handle this.
Vector tiles
PostGIS can create vector tiles on the fly now (exciting new feature!), and there is a very lightweight server that can go in front of PostGIS called pg_tileserv which can serve them easily across a network. So, we could run this on the same server that runs Postgres, and get folium to connect to it. Unfortunately folium doesn't support vector tiles by default, but there is a Leaflet extension that does - and we could write an extension for folium to work with this Leaflet extension.
Advantages:
Disadvantages:
Concluding thoughts
This is the result of various bits of research I've done, plus other experience I have (for example, I've been using and deploying TiTiler in my other work at the moment). I don't know which is best, as it will depend very much on the client's requirements. I hope these notes on advantages and disadvantages will be useful - and I hope the notes will help any future developer implement one of these solutions. Happy to discuss further, just wanted to get these thoughts out of my mind and into a semi-permanent place.