Open markerikson opened 6 years ago
I know this doesn't really answer the question, but this might be useful:
https://www.npmjs.com/package/mbtiles-terrain-server
Obviously requiring node and things is annoying, but this looks like it may do what you're after.
I... actually seem to have added working support for writing directly to MBTiles this afternoon :)
A basic implementation appeared to be working right before I had to leave. I've got several more aspects of it I want to look at, though, including de-duping inserted tiles.
We actually have our own little homegrown Python server that can serve up tiles from either TMS folders on disk, or by retrieving them from MBTiles files. Wish I could post the code for that, but it's "proprietary". Really, though, it's only like 250 lines of Python, and half of that is looking in subfolders for available tilesets and listing them internally as available. The actual logic for handling a request is straightforward - take ZXY values, query DB, return blob.
Fortunately, since CTB is an open-source project already, I do intend on actually filing a PR that adds the write-to-MBTiles capability once I'm sure it's working well.
(Side note: a quick test of a 5x5-degree 90m dataset took 1:35 to write out 17K tiles. Writing that same dataset into an MBTiles file only took 35 seconds :) Let's hear it for not touching the disk as much!)
So that was with heightmaps. Just did another run with quantized-mesh tiles. Here's how all this looks together. The calculations generated 17,685 tiles, and this was on a Win10 machine, NTFS file system.
Time (minutes)
Heightmaps | Quantized-Mesh | |
---|---|---|
Tile files | 1:35 | 2:20 |
MBTiles | 0:35 | 1:45 |
Size (MB)
Heightmaps | Quantized-Mesh | |
---|---|---|
Tile files | 72.1 | 28.4 |
MBTiles | 76.2 | 31.9 |
Size used on disk (MB)
Heightmaps | Quantized-Mesh | |
---|---|---|
Tile files | 109 | 54 |
MBTiles | 76.2 | 31.9 |
Summarizing:
Very interesting report, thanks for sharing Mark
I just tried using MD5 hashing to detect duplicate tiles and normalize them, as seen in some other MBTiles implementations, but an initial test run against that same block showed no duplicates. My immediate guess is that terrain tiles (especially quantized-mesh format) are unlikely to be bit-for-bit identical, whereas images may be likelier to see duplication. (I suppose the most likely candidates for duplicates in either case would be over the ocean, and what I've seen is that most terrain datasets only cover land areas for obvious reasons.)
I'll stash those changes and leave them out for now. No point in the expense of running MD5 hashes if there's not going to be any dupes found.
Yeah, thinking about it further... since heightmaps are just height values, those might all be bit-for-bit identical. However, since quantized-mesh files by definition involve lat/lons and ECEF coordinates, they can't be identical.
I'm doing some tests with a VRT that uses four 1x1 tiles sliced from the corners of a 5x5 tile as the sources. GDAL is interpreting all of the empty space in the middle as heights of 0, so you'd expect those to be identical results.
For a quantized-mesh MBTiles output, all 17865 tiles are unique. Many of them appear to be 131 bytes, but the bytes are different.
For the same source and heightmap MBTiles output, I see 3105 unique tiles out of 17865. Weirdly, the MBTiles file size seemed about the same between a run with de-duplication and without de-duplication.
Since my current end goal is to actually generate a quantized-mesh dataset for myself, I'm going to stash the de-duplication changes and keep moving.
Progress update: my remaining goal is to be able to have CTB intelligently only generate terrain tiles for areas with actual data. That way, we won't waste large amounts of time and disk space generating tiles for areas like the oceans.
The problem is that a worldwide terrain dataset creates a worldwide bounding box. Depending on your dataset you might not have any actual terrain data for the oceans, but the oceans are included in the bounding box. So, CTB will try to iterate over the entire earth to generate tiles, and that will include a lot of "empty" ocean space (either as zeros, or actual NODATA values).
GDAL 2.2 introduced a "sparse datasets" capability, which lets us query blocks of interest to see if they do or do not contain valid data. As of last night, I think I've successfully been able to use that in TerrainTiler.cpp
to check if the area covered by a tile is empty.
From here, my plan is:
1) Add logic in both TerrainTiler.cpp
and MeshTiler.cpp
to mark their tile types as "invalid". If so, skip actually doing the terrain warping for that tile entirely.
2) In the tile iterator threads, collect the coordinates of all valid tiles
3) Use a clustering algorithm like DBSCAN to group up nearby tile coordinates on each zoom level
4) Collect the bounding boxes for those clusters, for each level
5) Write out the multiple bounding boxes in the layer.json
metadata file
I figure this should result in a drastic reduction in both final output size and processing time for a large worldwide dataset.
Awesome idea.
This feels like this would be the first step towards allowing us to add new rasters to existing terrain sets. As I see it, to do this, there’s basically three major cases to consider:
I’d be willing to have a crack at adding this functionality once you’ve got your stuff working, but my C++ is pretty poor, so if someone who knows what they’re doing wanted to beat me to it, I’d be fine with that.
CTB currently writes individual terrain tiles directly to disk. As zoom levels increase, this leads to millions of individual terrain tile files in thousands of folders, which are hard to copy and move around.
The MBTiles file format is a widely used container for image and terrain tiles. It would be great if CTB supported writing tiles directly into a designated MBTiles container file.
We're planning to tackle regenerating our own imagery and terrain datasets within the next few months. It'd be great if someone happened to implement MBTiles capability before then. If not, I may be able to tackle it myself, although it would help if someone who's more familiar with this codebase could offer advice on the best approach for doing so.