CesiumGS / 3d-tiles

Specification for streaming massive heterogeneous 3D geospatial datasets :earth_americas:
2.15k stars 469 forks source link

Suggestion for hierarchies in batch table #66

Closed arneschilling closed 7 years ago

arneschilling commented 8 years ago

We are working on ideas for supporting hierarchies in B3DM. The background is that we are working with 3D GIS and BIM data (as CityGML or IFC), which are usually more complicated than 3D viz datasets. We want to preserve as much information as possible. Currently there is a way to group nodes in GLTF. However, we also want to optimzie rendering as much as possible and make use of batches including attribute tables in B3DM. In our scenarios, we have complex input data with groups, hierarchies and attributes on different levels. For instance, a building has a unique id and a set of attributes and properties. The parts the building is made of also has a set of attributes, which are usually different. Sometimes we have 3 hierarchy levels and more. the application we are working on must support selecting and highlighting entities at all available levels and display the attached attributes (as key value pairs). We want to click on buildings and display building specific attributes and we also want to click on building parts and display component specific (wall, roof, window, door...) attributes without switching to another data set with different configuration. The behavior of what kind of element is selected is controlled by a toggle button or by other means.

The current design of the batch table is quite simple, it is basically a 2D grid. Now the question is how we group together batches and attach attributes to these groups.

We found out that the number of columns is flexible and that its not restricted to the number of batches. We can extend the batch table to include additional columns representing abstract features for which we can include attitional attributes. This is nowhere specified, but it currently happily consumed by Cesium.

@jbo023 has set up a demo application http://hosting.virtualcitysystems.de/demos/hierarchy/ Use the CityGML Explorer to access attributes and toggle between building selection and part selection using the buttons on the right.

Example B3DM file: http://hosting.virtualcitysystems.de/demos/hierarchy/examples/data/buildings_semantic/15/35210/6826.b3dm

There is no formal specification of our approach because we see it as workaround taking into account the current limitations of B3DM. Please let us know in case somebody is working on a similar topic. We are happy to discuss possible solutions.

As to the example file above, you will see batch table with ids, attributes (ignore the strange format for now..) and a row called parentPosition. The latter is providing the information on how things are grouped together. E.g the first two entries in this row are 1243, which means that these batches are grouped together. The id of this group can be found in the id array at position 1243. However, there is no batch with this number, its just an abtract group feature.

What do you think of this approach? Has anybody alternative ideas or suggestions to accomplish this? Our intention is to include this feature in the 3D Tiles specification so that we can base our framework on the master branch. We can try to formally describe our concept in the git repo and create a pull request.

best regards, Arne

pjcozzi commented 8 years ago

Thanks for all the details for this use case. Is this related to #65?

jbo023 commented 8 years ago

Its related, but allowing JSON Objects in the batchtable is independent of this diskussion.

pjcozzi commented 8 years ago

If I understand correctly, you want to be able to have more items in the batch table than features, e.g., buildings, in the .b3dm file so that fields in the batch table can be used as an index to data in the batch table. Is this correct?

This is because you have a nested data structure you want to navigate?

It seems like an ability to carry an app-specific payload as part of the tile or to get the data from another web service would be a cleaner approach that is general enough to put into the core spec.

clausnagel commented 8 years ago

Yes, the problem is that the current design of the batch table is suitable for flat data but not for hierarchically structured data. So we would like to be able to express and access hierarchical data directly in the .b3dm file.

For example, assume a typical CityGML hierarchy: a building is described by a roof and wall surface and the wall surface contains a door.

building   |- roof surface   |- wall surface       |- door

Now, for instance, we would like to be able to color only the roof surfaces of the buildings according to an attribute that is only assigned to the roof. Likewise, we would like to be able to color only the wall surfaces, or the entire building (which means coloring all its nested child elements).

And it should be possible to navigate the hierarchy. For instance, when clicking on the door it should be possible to retrieve its attributes but also its (transitive) parents and their attributes.

Please note that the extension of the batch table as done in the demo application is only a suggestion and the way we currently solved the problem. I think the core request is to have support for hierachical data in the 3D Tiles specification. If there are other (better) ways than extending the batch table, we are also happy.

arneschilling commented 8 years ago

It's not necessarily app-specific payload, since its about having a general concept to deal with nested data, but I agree that not everybody will want to use this kind of feature, so it should be kept optional. We were trying to find a workaround so that we can navigate upwards starting with a picked object ID and find parent group nodes which we can highlight. Storing hierarchies and additional attributes in separated files is an option, but this would double the number of HTTP requests and the contained information must be merged with the B3DM content anyway. That's why we were looking for a way to have it directly in B3DM.

pjcozzi commented 8 years ago

Given that an element in the batch table arrays can be an array, couldn't each feature just have an array of batch ids for its children? In your example, building would have [roof, wall]. Likewise, each feature could also have a batch id for its parent (or an array for parents). Would this work?

jbo023 commented 8 years ago

Hmm, the problem are the building batch entries, because a buildings does not have a geometry, only the children [roof, wall] have geometries.

pjcozzi commented 8 years ago

Ah, I see. If this is a common enough use case then we could add an optional app-specific payload to each tile (the data could be JSON, binary, etc.). If we have one more user (including me if I run into any cases) who needs this, then I say that justifies the minor spec complexity, and that we add it.

Otherwise, you could store it in a separate payload or have a convention where, for example, batch id 0 includes an object with the metadata for geometry-less features.

pjcozzi commented 8 years ago

Labeling this as draft 1.0 since the ability to have multiple ids (or however it is implemented) to identify buildings and facades, for example, is becoming commonly requested.

lilleyse commented 8 years ago

We have some ideas for defining a hierarchical batch table, I'm curious to know what others think.

@arneschilling @jbo023 @clausnagel @pmconne

Summary

We want to be able to pick a feature in the tile and get information from its own metadata, as well as metadata from its parent, grandparent, etc. In the current batch table spec, this is only easily possible by flattening all the hierarchy’s metadata in each feature, resulting in a lot of duplicate data.

Batch Table Hierarchy

This approach rethinks the batch table in terms of a hierarchy of items, where each item has a “class” associated with it. It supports metadata for features and abstract “groups” that aren’t backed by geometry - like in @arneschilling's example where the walls and doors are features, but buildings aren’t.

Example

Number of doors = 4 Number of walls = 3 Number of buildings = 2 Number of zones = 1 Number of features = 7 (door and walls) Number of items = 10 (doors, walls, buildings, zones --- buildings and zones are not backed by geometry so they are abstract items)

Organized like:

{
    CLASSES : [
        {
            name : 'door',
            door_mass : [10, 11, 14, 7],
            door_width : [1.2, 1.3, 1.21, 1.5],
            door_name : ['door0', 'door1', 'door2', 'door3']
        },
        {
            name : 'wall',
            wall_paint : ["red", "green", "pink"],
            wall_windows : [4, 6, 1],
            wall_name : ['wall0', 'wall1', 'wall2']
        },
        {
            name : 'building',
            building_height : [100, 20],
            building_name : ["building0", "building1"]
        },
        {
            name : 'zone',
            zone_name : ["zone0"]
        }
    ],
    CLASS_ID : [0, 0, 1, 1, 0, 0, 1, 2, 2, 3],
    PARENT_ID : [2, 2, 7, 7, 6, 8, 8, 9, 9, -1]
}

CLASSES defines the classes. In this example there are 4 classes: doors, walls, buildings, and zones. Each class is like a mini batch-table - storing the properties for all items of that class. The arrays can be JS arrays or batch table binary arrays.

CLASS_ID stores the class of each item. In the above example item 0 is "door0", item 1 is "door1", item 2 is "wall0", item 3 is "wall1", and item 4 is "door2".

PARENT_ID stores the parent of each item, as an index into the CLASS_ID section. -1 means the item has no parent. In the above example "door0"'s parent is "wall0".

Multiple Parents

In order to support an item having multiple parents, such as parents that act as classification tags, the approach can be extended so each item defines its parent count:

    CLASS_ID : [0, 0, 1, 1, 0, 0, 1, 2, 2, 3],
    PARENT_COUNT : [1, 2, 1, 1, 1, 1, 1, 1, 1, 0]
    PARENT_ID : [2, 2, 3, 7, 7, 6, 8, 8, 9, 9]

Now "door1" has two parents: "wall0" and "wall1".

Across Tiles

One challenge is supporting the concept of a hierarchical bath table across different tiles.

In 3D Tiles implementations intermediary tiles that contain batch table metadata may be unloaded, so allowing PARENT_IDs to reference external tiles is dangerous.

Another approach is to contain the full hierarchy in each tile. The downside is duplicate data across sibling tiles, which could be minimal in some use cases but worse in others. This should still improve the original situation because duplicate data is stored across tiles rather than across features.

Another downside here is it may be difficult for implementations to support editing batch table values since they would need to edit the duplicate data that exists.

Any ideas or feedback on this approach?

jbo023 commented 8 years ago

At a first glance this looks like a nice concept. In the paragraph CLASS_ID you probably meant "item 3 is wall1", if not I didn't get the concept yet. Also the PARENT_COUNT row probably has one item to much.

lilleyse commented 8 years ago

Thanks for the corrections @jbo023, should be fixed now.

jbo023 commented 8 years ago

If i want to get the Attributes for a given BatchID:

I have to find the corresponding ClassID, which is just an array access, where the batchID is the index. But to find the correstponding array position for the CLASS attributes i have to iterate over the CLASS_ID row and count the appearances of the ClassId? Is this correct? This seems a bit expensive.

lilleyse commented 8 years ago

Yes it would require knowing how many appearances of CLASS_ID came before it. This can be done once at load time so that each item stores its index into its class's array. The alternative is providing another array of indices in the batch table hierarchy - possibly not worth the extra data.

jbo023 commented 8 years ago

Ah yeah I didn't think about doing this on load. I was more thinking about doing this on the fly for styling. But this is really not worth the extra data in the b3dm tile if we can just generate this info on load.

pmconne commented 8 years ago

This sounds pretty solid to me. In my use cases, the same fixed set of 'parent' items will tend to be shared amongst all tiles - but the ratio of parents to children will tend to be quite small, so duplication of parent data shouldn't be a big deal.

pjcozzi commented 8 years ago

Thanks for the prompt input @pmconne and @jbo023.

As @lilleyse and I discussed offline, I think this is a great approach. Here's some notes for the spec and ideas for the schema:

Spec Content and Terminology

Suggested Schema Changes

        {
            name : 'building',
            building_height : [100, 20],
            building_name : ["building0", "building1"]
        },
        {
            name : 'zone',
            zone_name : ["zone0"]
        }

It is awkward to differentiate if ["zone0"] is class data or the first class instance's data (in this case there are no class instances).

Please think through the schema and propose something, but it could be as simple as adding an instances object property, e.g.,

        {
            name : 'building',
            instances : {
                building_height : [100, 20],
                building_name : ["building0", "building1"]
            }
        },
        {
            name : 'zone',
            zone_name : "zone0"
        }

Implementation

Here's a few use case that we want to make sure are reasonable:

What other cases should we consider?

lilleyse commented 8 years ago

It is awkward to differentiate if ["zone0"] is class data or the first class instance's data (in this case there are no class instances).

In the example zone0 is still considered an instance of the "zone" class. I like the separation that the instances object provides, so the example JSON would be:

        {
            name : 'building',
            instances : {
                building_height : [100, 20],
                building_name : ["building0", "building1"]
            }
        },
        {
            name : 'zone',
            instances : {
                zone_name : ["zone0"]
            }
        }

Do we have a strong need to support Class data? I figured we would have a required set of Class properties like name, id, etc but not allow for custom properties, since that data really belongs to the instances.

pjcozzi commented 8 years ago

Do we have a strong need to support Class data?

The gmail labels example is a good example to consider, there may be a number of classes, for example: friendly, neutral, enemy, air, sea, ground, and we want to assign these to each feature, e.g., [friendly, ground], but none of the classes are physical features. Feels like we would want per-class data, not a dummy class instance to assign data to the class. Does this complicate the implementation or spec significantly?

lilleyse commented 8 years ago

It doesn't complicate it too much. The main complication is that PARENT_ID could be both an index into the CLASS_ID array (when referring to an instance) or a reference to a CLASS's unique id. The CLASS's id would need to be greater than the number of items in the CLASS_ID array so that its clear that an instance's parent is a class rather an instance.

pjcozzi commented 8 years ago

Can you provide an example? It sounds like it might actually be easier to keep it as is even if it isn't as conceptually clean from a purist perspective.

lilleyse commented 8 years ago

Edit: this is not a proposed solution, just an example case

{
    CLASSES : [
        {
            name : 'door',
            id : 3
            instances : {
                door_mass : [10, 11],
                door_width : [1.2, 1.3],
                door_name : ['door0', 'door1']
            }
        },
        {
            name : 'wall',
            id : 4
            instances : {
                wall_paint : ["red"],
                wall_windows : [4],
                wall_name : ['wall0']
            }
        },
        {
            name : 'ground',
            id : 5

        }
    ],
    CLASS_ID : [0, 0, 1],
    PARENT_ID : [2, 5, 5]
}

door0's parent is wall0, door1's parent is ground, wall0's parent is ground.

Since here the ground class has no instances, door1 and wall0 set their parent id to the ground class id (which is 5). The ground class id can't be 0, 1, or 2 because a parent id set to that value would reference one of the 3 instances. So the limitation is that all CLASS ids need to be greater than the number of instances.

Right now I'm more in favor of treating everything as an instance.

arneschilling commented 7 years ago

Hi,

regarding the terminlogy: I would like to stick to the concept of FEATURES. In GIS features represent geometries with attributes. In my opinion it does not matter whether the geometry is explicit, i.e. defined as GLTF mesh, or aggregated, i.e. a collection of meshes representing a whole building. A feature could also be an instance of a generic prototype (tree model) with custom instance attributes. So there is no need to distinguish between items that are backed by geometries and groups that are made of items. Both need metadata and can be used for nesting features.

I find CLASSES an appropriate term for features of a specifc type. However, I would not bloat the CLASS concept with semantic meaning. My understanding is that classes represent features with a specific set of attributes. Like in higher programming languages classes have a fixed set of fields (in this case attributes) that may or may not be set by instances. Thats a nice concept because we have varying sets of attributes. A "zone" may have only an ID wheres a door may have 20 or more attributes. Merging all attributes in a single table (single class) is possible but results in many empty table entries. Having multiple classes helps compressing the attribute workload. In our current implementation we use JSON Objects containing attribute sets, but this creates redundancies as well, because the attribute name must be repeated every time, e.g.:

   "attributes" : [{"externalReference externalObjectName":"DENIAL4300009RYq","creationDate":"2016-07-06","gml:name":"DENIAL4300009RYq","HoeheGrund":"54.3",......

The classes concept is inbetween these extremes. We can easily figure out which features we can associate with a class.

CLASSES : [
    {
        name : 'featuretype123',
        id : 3
        instances : {
            'externalReference externalObjectName' : ['DENIAL4300009RYq', 'DENIAL430043345'],
            'creationDate' : ['2016-07-06', '2016-07-07'],
            'gml:name' : ['DENIAL4300009RYq', 'DENIAL4300009RYe'],
            'HoeheGrund' : [54.3, 14.4]
        }
    },
    {
        name : 'featuretype456',
        id : 4
        instances : {
            'id' : ['zone123']
        }
    }
 }

Questions:

lilleyse commented 7 years ago

In part using the name "feature" is to distinguish something that is independently visible and styleable. Naming every instance in the class hierarchy a feature may result in some confusion, as the non-geometry-backed instances do not have the same styling abilities.

As for the naming of class, I'm curious if you have any suggestions.

is it necessary to have class ids and names? If the class simply represent a set of fields/attributes, referencing to the index within the CLASSES array is sufficient.

In the Cesium implementation PR (https://github.com/AnalyticalGraphicsInc/cesium/pull/4625) I removed the id. The name is useful when checking if a feature is an instance of a certain class, via isClass, isExactClass, getClassName.

why do we need support for multiple parents? In scene graph concepts each node can have only one parent. If for performance reasons geometries need to be shared among nodes, we can do this using GLTF cross references and reusing vertex data.

Multiple parents is useful for grouping instances in more flexible ways. One example might be to group a random half of the instances into a "classifier_a" class and the others into a "classifier_b" class, in addition to the existing hierarchy.

arneschilling commented 7 years ago

In our application, we need a feature hierarchy first, not a class hierarchy.

Example of class hierarchy would be:

Object -> ManMadeObject -> Building -> ResidentialBuilding -> Villa

(the Villa is an instance of ResidentialBuilding as well as ManMadeObject. If you want to style all ResidentialBuildings you could use this class inheritance information)

Example of feature hierarchy:

City -> Building -> BuildingPart -> Wall -> Door -> DoorKnob

(the DoorKnob is part of a door, which is part of a wall, which is part of a BuildingPart etc. If you want to style all elements of a particular wall, you could use this grouping Information)

I would suggest to make a clear distinction between these two concepts.

In the examples above, the instances make the features. CLASSES are classes.

lilleyse commented 7 years ago

Thanks for breaking it down, the spec may need to cover both concepts as use cases for the hierarchy. The cases don't need to be treated differently from a spec/implementation point of view though, and can even operate simultaneously with multiple parents.

pjcozzi commented 7 years ago

If anyone wants to review the spec for this, see https://github.com/AnalyticalGraphicsInc/3d-tiles/pull/171

pjcozzi commented 7 years ago

Thanks for everyone's input. #171 was merged.