3DTILES_implicit_tiling: How to make subtree division easier to do correctly?

Suppose you had an implicit quadtree with maximumLevel: 7 (that is, the 8th level from the top, levels are 0-indexed). Depending on how you choose subtreeLevels, there are some tradeoffs:

subtreeLevels: 8 - means you have one subtree file. no bits are wasted
subtreeLevels: 4 - you now have up to 257 subtree files. 4 divides 8, so again no bits are wasted. Though there's many more files here.
subtreeLevels: 7 - this is the worst case. since there's 8 levels with data, the first 7 show up in the first subtree, but then there's 4^7 = 16384 subtrees (at most) and each one only use 1 bit for the root tile! that's (4^7 - 1) / (4 - 3) - 1 = 5460 bits wasted (plus padding), per subtree. In this case that's 11 MB of wasted space, and this is a relatively small example.

How can we avoid this? Some ideas:

just let compression handle it. All the wasted bits are 0, so at least it should compress well. However, you still have an exponential number of tiny files
Allow the availability buffers to be shorter than the required length, but only past maximumLevel. Each file is smaller, but again you have tons of tiny files
Add an implementation note with some tips on how to pick subtreeLevels - there's a few things to think about:
- Ideally you want maximumLevel + 1 to be divisible by subtreeLevels (or as close as possible), so you avoid wasted bits
- larger subtreeLevels are better up to a point. You want reasonably large files so there's fewer of them, but not so large that it takes forever to download them.

We want to make it easy for the tileset author to divide the subtrees efficiently, though what's the best way to do this, as it does vary a lot with the size of the tree

I've wondered about this too and I think it could come down to heuristics since there are so many different factors involved. One idea is to take all of these different factors and assign priorities to them at tiling time.

For example, each subtree level calculates file count overhead, file size overhead (compressed / decompressed), network request overhead, etc. Then have a default set weights or let the user provider them. The subtree levels with the minimum cost is chosen. The subtreeLevels: 7 case would be naturally deprioritized if the file count cost has a big enough weight relative to the other factors.

One could probably compute some numbers purely analytically, without explicitly creating the (implicit...) tileset. But the influence of the actual availability on the result is subtle: A single bit decides whether some huuuge buffer has to be used, or whether something collapses into a trivial constant:0. But FWIW, I just generated the files for full implicit quadtrees, with levels=[2...8] and subtreeLevels=[2...levels], where "full" means that all tiles and all content are available. The resulting overall sizes:

levels_2-subtreeLevels_2/=1016
levels_3-subtreeLevels_2/=6649
levels_3-subtreeLevels_3/=1017
levels_4-subtreeLevels_2/=4217
levels_4-subtreeLevels_3/=23545
levels_4-subtreeLevels_4/=1017
levels_5-subtreeLevels_2/=94331
levels_5-subtreeLevels_3/=23547
levels_5-subtreeLevels_4/=95227
levels_5-subtreeLevels_5/=1019
levels_6-subtreeLevels_2/=55420
levels_6-subtreeLevels_3/=13820
levels_6-subtreeLevels_4/=95228
levels_6-subtreeLevels_5/=443396
levels_6-subtreeLevels_6/=1028
levels_7-subtreeLevels_2/=1497212
levels_7-subtreeLevels_3/=1455612
levels_7-subtreeLevels_4/=95228
levels_7-subtreeLevels_5/=443396
levels_7-subtreeLevels_6/=2851844
levels_7-subtreeLevels_7/=1028
levels_8-subtreeLevels_2/=874622
levels_8-subtreeLevels_3/=1455614
levels_8-subtreeLevels_4/=52222
levels_8-subtreeLevels_5/=443398
levels_8-subtreeLevels_6/=2851846
levels_8-subtreeLevels_7/=28181510
levels_8-subtreeLevels_8/=1030

Of course, this is not "realistic", but may give a ballpark estimate (and ... I wanted to try stuff out ... so why not?). The problem with the exponential growth for the case of subtreeLevels=levels-1 becomes apparent (size~=32e^2.2\levels). Using subtreeLevels=ceil(levels / 2) seems to be the best one on this artificial data, but it's not yet clear whether it's possible to draw conclusions for "real" data. (In order to do experiments with "more realistic" artificial data, the question of what constitutes "realistic" had to be answered. For example, one could trivially create tilesets where only a certain (random?) amount of tiles/content are available, but whether this is "realistic" or not is debatable...)

CesiumGS / 3d-tiles

3DTILES_implicit_tiling: How to make subtree division easier to do correctly? #576