Conversion "BuildingInstallation" from GML to JSON

hugoledoux commented 5 years ago

I downloaded a small part of Rotterdam-3D (https://www.dropbox.com/s/mcf5jen1hper0nh/9300_4400.gml?dl=0) and if I convert with citygml-tools then the CityJSON file contains BuildingInstallations that have an empty geometry, while in CityGML there is a geometry.

See eg gml:id=UUID_d3aa9299-874b-405c-98c1-60f852445365

Actually, in CityJSON the geometry of the Installation is in the building itself, and the geometry of the Installation is empty.

A non-documented feature or a bug?

clausnagel commented 5 years ago

Bug :-) Thanks for reporting.

clausnagel commented 5 years ago

Interesting example. What would you expect the CityJSON output to look like?

The surfaces of the BuildingInstallations are referenced from the gml:Solid of the corresponding Building feature in CityGML. Thus, in CityJSON they must also be part of the solid geometry of the building itself, otherwise the solid would have holes and thus would be invalid. And since they are semantic surfaces, they are listed in the "semantics" property of the building geometry.

According to your issue, the surfaces should also appear as separate multisurface geometry of the CityJSON "BuildingInstallation" object. And since they are still semantic surfaces, I assume that this geometry also must have a "semantics" property listing the surfaces.

This means that the surfaces and the "semantics" properties of the building geometry and of the installation geometry are partly redundant. I have created an example of how I think the CityJSON result should look like for one of the CityGML buildings: ID_0599100000223738.zip

Do you agree with this, or would you say that the redundant information must be avoided (if so, how?).

clausnagel commented 5 years ago

I loaded the above CityJSON file into the CityJSON web viewer. It seems that the redundant surfaces in the geometry of the building and of the building installation are rendered twice. Am I right with this? I think this is rather unwanted?

hugoledoux commented 5 years ago

I had never thought about this, interesting example indeed.

One question: if the BI was an antenna (thus not formed of semantic surfaces) then your code would put those geometries in the BI in CityJSON, right?

Just a few thoughts because I don't know what we should do here, just thinking out loud:

there should be a way to identify the dormer, shouldn't there be?
having redundant information is probably not what we want
what about having a new semantic class DormerSurface? I mean, are dormers really a new installations anyway? They are part of the roof's structure, and shouldn't be a separate City Object I guess.
Only antennas and chimneys (small ones, I am aware that large ones could suffer the same issue as dormers) and balconies are BI.

Your thoughts?

balazsdukai commented 5 years ago

Very interesting issue. initially I've written a long argument on dormers not being BuildingInstallations and thus creating a new semantic class for them, but the more I think about this, the more I believe that in the current case, where the BuildingInstallation is a dormer, the CityJSON should:

have two objects,
- one with the geometry of the Building, without the geometry of the BuildingInstallation. Yes, that would make the Solid of the Building invalid, because there is a hole in place of the dormer. However, in "real world" if we take away the dormer, there will be a hole too.
- one with the geometry of the BuildingInstallation, semantics included and all.

The union of the two yields a valid Solid in the current case. However, in case the BuildingInstallation is not a structural part of the building (removing it doesn't create a hole), then their union won't be a valid Solid (eg. airco units). Thus difficult to validate the citymodel with BuildingInstallations. But I think this is a different discussion.

And to generalize, I would always exclude the geometry of the BuildingInstallation from the geometry of its parent.

clausnagel commented 5 years ago

I had quite similar first thoughts for a general CityJSON representation. I would just be a bit more strict regarding solids:

Every child object shall have its own geometry that is separate and excluded from the parent's geometry. If you just want to semantically classify the surfaces of the parent's geometry, then don't use a child but the semantics property of the parent's geometry.
If a child's geometry shares a common boundary with its parent, and in reality this common boundary is a "hole", then this "hole" must be closed if "Solid" geometries should be used. For this purpose, the "hole" shall be closed with a surface geometry that shall be classified as "ClosureSurface". The closure surface must be modelled for both the child's and the parent's solid geometry. It will have a different orientation in either context, so it is not really redundant.

So, if a dormer should be an identifiable object, it shall be modelled as "BuildingInstallation" and then must follow the above rules. Alternatively, if you just want to represent the dormer geometrically, then it is only modelld as part of the "Building" geometry and there is no additional "BuildingInstallation". I would not introduce an additional semantic "DormerSurface".

The exact same modelling is possible (but not enforced) in CityGML 1.0 / 2.0. I can adapt citygml-tools so that CityGML data structured this way can be losslessly converted into CityJSON and back. This would be my preferred default case.

Now, the above Rotterdam data does not follow this modelling, but yet is a valid CityGML representation with valid solid geometries. I could think of the following mapping:

The CityGML bldg:BuildingInstallation is mapped onto a "BuildingInstallation" in CityJSON. It gets its own geometry with its own "semantics" property. The geometry would be a "MultiSurface" (like it is implicitly in CityGML) - so no need for adding a closure surface.
The "Building" geometry would consequently not contain the surfaces of the "BuildingInstallation". This leaves an hole in the solid, which should be closed by a closure surface to be valid. But closing holes is a non-trivial operation and goes beyond the scope of citygml-tools. I see two options:
1. A "MultiSurface" is written for the building instead of a "Solid".
2. Although being invalid, a "Solid" is written to the CityJSON file.

Well, I do prefer the first option. Yes, the resulting CityJSON would lack the closed volume geometry from the original CityGML file, but it would be valid.

hugoledoux commented 5 years ago

I've been thinking about this, and I like neither of your 2 options, sorry.

I know that these are CityGML-compliant, but both reduce the usability of the data in my opinion. (I agree that having a ClosureSurface would solve it elegantly, but as you point out it's complex to crunch the data and out-of-scope for citygml-tools).

I would argue that having solids (thus watertight) for buildings is far more important in practice than being able to count the number of dormers. Perhaps I am wrong, but who and why does one need to count dormers? While having a watertight solid allows us to (1) calculate volumes, (2) define clearly what the interior/exterior is (and thus convert easier to other formats).

If we labelled the surfaces as DormerSurface then we could, not that simply, still be able to count the number of dormers.

So I would actually favour the status quo: dormers are part of the roof.

Also, notice that identifying where a dormer "starts" and/or taking it "apart" from the main building to create a BuildingInstallation can be tricky in practice, see these 2 cases:

clausnagel commented 5 years ago

Ok, I agree that identifying dormers in practice can be tricky. I would leave it to the user to decide based on his or her requirements and applications whether dormers should be represented as identifiable objects in the model. So I would suggest to skip this more general discussion here.

Regarding the data conversion: Yes, I absolutely agree with the advantage of having a watertight building geometry. My last post was just trying to identify general requirements for modelling parent and child objects, and what that would mean for converting the dataset in question.

So, based on the current result of citygml-tools, the empty BuildingInstallation could be simply deleted from the final output. Et voilá, the result is a wateright building geometry with classified roof and wall surfaces.

The information about the dormer would then be missing. DormerSurface might help, but I have open questions:

The example dataset does not further classify the building installation. So for citygml-tools, it could also be an antenna or a balcony. How should the creation of a DormerSurface be triggered in this example?
The building installation is currently represented via wall and roof surfaces. What is your intended use for DormerSurface?
- Should every individual roof and wall surface be mapped onto a separate DormerSurface? This would mean that every building installation would result in a collection of DormerSurfaces. If so, then wouldn't DormerRoofSurface and DormerWallSurface be more appropriate in this example? On the other hand, counting individual dormers based on this representation would be more difficult. And what if the original building installation has attributes assigned? Where would they go? Should they be copied to every resulting dormer surface? Or simple be skipped?
- Or should there only be one DormerSurface per building installation? This would make counting easy, and attributes of the building installation could be directly assigned to the DormerSurface. However, the information that the dormer has walls and roofs would be lost in this scenario. And note that also wall and roof surfaces can carry attributes in the original CityGML file. How would they be mapped if there is only one DormerSurface?

I am really open to this. What would be your preferred representation?

And one more thought. In my first post I attached a CityJSON file where the dormer is represented as BuildingInstallation and the building geometry is watertight. This solves all above questions, because the dormer geometry can still be classified as roof and walls, and attributes can be assigned to both the BuildingInstallation and its semantic surfaces. The only drawback of this file is that the surfaces of the dormer are redundantly represented for the BuildingInstallation and the Building.

Is this really a drawback given all the questions with other representations?
Also this redundancy could be solved: Imagine that CityJSON would not only have a vertices array, but also a surfaces array. Then the BuildingInstallation and the Building could use the same surfaces without redundancy... well, just a thought :-)

clausnagel commented 5 years ago

btw, sorry for the lengthy posts :-)

balazsdukai commented 5 years ago

Yes, I think we agree that watertight geometry is a priority. I would also add that having a data model/format that is easy to parse is at least as important. These two properties allow the users to at least have data set that they can read. Then its up to them to develop the code that identify the dormers or do any other analysis in this data. That's why I'm very confident about adding any Dormer* semantic. Because if we go down that road, then why not include all of the other architectural elements? So I prefer the more generic BuildingInstallation, in which a the data creator can define in an attribute the object that the BuildingInstallation represents.

Following, in the case of the example dataset I wouldn't specify any DormerSurface, because it's not in the original data set either. Although we identified that the installation is a dormer in fact by looking at model, but this is not possible to automate.

Then in a "minimal" alternative, I would keep the Bulding object as it is in the example that you sent @clausnagel, where the BuildingInstallation is part of the Building geometry with all its semantics, and remove the BuildingInstallation object completely. Then there is no redundancy, but we would loose information. However it leads to data set that is "simpler" to process, and delegates the responsibility of identifying the dormer to the end user/developer who wants to find the dormers on the roofs.

hugoledoux commented 5 years ago

Yeah, okay we/I are/am convoluting what should be the ideal case (should there be dormers as separate objects?) and the particular case of citygml-tools here. Let's focus solely on the latter.

I don't have strong feelings because nothing is perfect.

I agree that we don't know it's a dormer so impossible in this case to label surfaces as a dormer. My point was that in an ideal case, we should add semantics surfaces to the Building, and indeed DormerRoofSurface and DormerWallSurface would be smarter than just a DormerSurface. But here it's not relevant.

I would actually drop the empty BI and just keep the Building, and not have any redundancy in this case. But the small dataset you gave above is also perfectly fine to distribute, but I think it would create a bit more confusion in practice. That's all.

And we discussed having a list of surfaces for a CityJSON file. Smart in theory, but in practice I reckon it's a mess when one updates the file. I wrote the cjio merge() and subset() operators, and if I had had to select and update and delete vertices and surfaces IDs I'd have suffered a bit more. So at this moment I favour simplicity over "richness of representation", but who knows, maybe I could change my mind 😆

clausnagel commented 4 years ago

Sorry for being unresponsive. I have worked on this issue in the meantime though, and decided to fix it in the following way:

The default behaviour of the to-cityjson command is to map all CityGML top-level and child objects to corresponding CityJSON objects. Geometries and semantic surfaces will also be created for child objects even if this results in duplicate geometries.
The user can choose to avoid duplicate geometries for child objects with the new --remove-duplicate-child-geometries option. In case a child object has an empty geometry after removing duplicates, it will be removed from the CityJSON output.

Thus, the small dataset I provided as example above is now the default result. It is as close as possible to the original CityGML dataset as it still contains both the building and the building installation as well as the watertight volume geometry of the building.

If a user favours non-redundant geometry over mapping all features, then the --remove-duplicate-child-geometries option is the solution.

The bugfix is available in the latest version 1.3.2.

citygml4j / citygml-tools

Conversion "BuildingInstallation" from GML to JSON #7