Fixes for received tpk files

MurrayMcDonald16 commented 4 years ago

We were originally receiving MBTiles files to use with an application we had developed for a client. The client switched to sending TPK files so we looked for a way to convert them to MBTiles format and found this code base. I ran into a number of issues trying to convert the supplied files and am providing the modifications I made to accommodate them in this branch. Changes are detailed below:

The provided files consisted of WTMS zoom levels of 9-15 but the conf.xml had the LevelID values of 0-6. The corresponding "resolution" values of the LODInfo elements were correct and could be used to calculate the actual WTMS zoom levels (one must have the correct zoom level number to properly "flip" the y-axis.
The provided files had differing "bounds" values in four different places: cd.conf, conf.cdi, iteminfo.xml and two values in mapserver.json: "initialextent" and "fullextent". The value in the iteminfo.xml was the "most correct" but still not perfect so I decided to add an option to the MbTiles export to calculate the bounds from the highest exported zoom level.
The provided files happened to have a large number of fully transparent tiles and the code used to combine tiles into an image doesn't need such tiles in the retrieved tileset so I added an option to the MBTiles export to drop transparent tiles.
Finally some of the provided files had bundles that contained row/columns that were out-of-range for the given zoom level. I'm not sure how this happens or what tools they are using to produce the TPK files but this was problematic so I added coded to discard out-of-range tiles.

Thanks very much, the conversion to MBTiles saved a lot of hassle!

coveralls commented 4 years ago

Coverage increased (+0.1%) to 90.329% when pulling c70f383f82ed52e9e340a737716895796d0319a2 on MurrayMcDonald16:fixes_for_received_tpk_files into bab9afd01e11c219e6ae0368cb84fdb80f94d485 on consbio:master.

brendan-ward commented 4 years ago

@MurrayMcDonald16 thank you for submitting this PR! Very sorry for the delays in reviewing it, I hope to be able to do so before too much longer.

brendan-ward commented 4 years ago

@MurrayMcDonald16 is it possible to share a tiny TPK that demonstrates the problems your fixes here resolve, so that we can include them in testing? Or if you aren't comfortable with it making it out as a text fixture, send it to me directly at bcward@astutespruce.com so that I can test locally?

There's a few things that I want us to change about your approach here, but I want to make sure that my suggestions are guided by the underlying data.

For example, we can derive tile size from the tile package instead of hardcoding it, and I think is possible to keep the API focused on zoom levels (from a tile service perspective) and then decode the internal ordinal levels during reading the tileset so the user doesn't need to know that they are different. I think this would simplify some of the implementation...

MurrayMcDonald16 commented 4 years ago

@brendan-ward The files are on my desktop at work, none of them are "tiny" but I will unzip them and try to just include levels/bundles that demonstrate the issues and then re-zip them as TPKs. I will do that tomorrow.

BTW -- I have a Compact Cache V2 "tpkx" file (ARCGIS 10.3) that I will also send you. The CCV2 has the index directly incorporated into the "bundle" file with each entry being 8 bytes long. There is a 5-byte little-endian data offset value followed by a 3-byte little endian data length value, The index is stored in "row-major" order as opposed to the V1 index which is stored in column-major order.

brendan-ward commented 4 years ago

@MurrayMcDonald16 if it is as simple as modifying the conf.xml file and level directories within the _alllayers directory, I can create a test file from one of our existing files by deleting the first few levels. I'll try this now.

Perhaps just testing that calculating bounds from a given zoom level works properly is sufficient, and we don't need to test with a file that specifically has differing bounds?

MurrayMcDonald16 commented 4 years ago

@brendan-ward

Yes, you can certainly "roll you own" test files -- in my case the file had WTMS zoom level 9 stored in "_alllayers\L00", WTMS zoom level 10 stored in "_alllayers\L01", etc. The conf.xml LODInfo had the corresponding level ids "0", "1", etc -- by using the LODInfo\Resolution element values one could calculate the "true" zoom levels.

The main thing to test with calculating the bounds is that given the tile coverage Cn,Rn .... Cx,Rx at a specific zoom level one gets the proper min latitude, min longitude, max latitude, max longitude values. I did verify this manually but didn't produce a unit test for that. Please let me know if you need anything from me. Take care. Murray

brendan-ward commented 4 years ago

@MurrayMcDonald16 what do you think about the idea of making bounds calculation always automatic based on the maximum zoom level exported? (not asking for changes, asking for opinion only)

Since we allow you to subset out specific zoom levels, it seems like we can leverage your idea of calculating bounds to do so in a way that matches what you export. E.g., if you only export zoom level 0, bounds corresponds to bounds of the world. The downside is losing information about the bounds of the data that were rendered, rather than the tiles.

The mbtiles spec is a bit ambiguous on this front, since "rendered map area" could be taken to mean the full tile that was rendered (even if mostly transparent).

MurrayMcDonald16 commented 4 years ago

@brendan-ward when I was first looking for sample TPK files via google I found one that had very different coverages at different zoom-levels (an ArcGis sample using the city of San Diego -- so far the real maps our customer has supplied have identical coverages at all the zoom levels) so I decided to make the bounds calculation be a user option. The bounds end up in the "metatdata" table of the MBTiles SQLlite database, so they can be easily modified by someone who knows what they are doing :-)

brendan-ward commented 4 years ago

@MurrayMcDonald16 I refactored your approach a bit and added associated tests where I could.

Key changes:

--calc-bounds option is now --tile-bounds (interpretation is the same)
added the hex digest for transparent tile to the list of empty tiles, and now screen for that in both to_mbtiles() and to_disk(). A completely transparent tile is effectively an empty tile, so there isn't a need for us to handle that as a special case.
refactored calculation of tile bounds to use mercantile since it already handles that well, and I added new APIs in pymbtiles 0.5.0 earlier today that give us the min and max rows / columns from a tileset, so we don't have to track that as part of reading tiles here.
simplified tracking of zoom levels vs levels of detail in the TPK file. We want to expose zoom_levels consistently to users, and have these always be zoom levels; we use lods internally to work back and forth between what is in the TPK file and associated zoom levels.
formatted things using black - this unfortunately added a bit of clutter to the modified lines here, but made things a lot easier to read.

I created a test file for zooms 2-4 of one of our existing test files and used that to ensure that non-standard zoom levels are handled as expected.

We don't yet have tests for dropping transparent / empty tiles.

Note: you'll need to install mercantile and the latest version of pymbtiles to test these changes.

Since your TPK files were somewhat special, would you mind testing them with my updates here to make sure this still handles your needs correctly?

Thanks again for submitting your PR!

Per your comment about TPKX files, please do share. Note there is also #24 where you could attach that.

brendan-ward commented 4 years ago

@MurrayMcDonald16 thanks again for contributing this! Please open an issue if something I introduced here breaks your use case.

consbio / tpkutils

Fixes for received tpk files #25