buildingSMART / NextGen-IFC

61 stars 4 forks source link

Minimize Relationships #12

Open jmirtsch opened 4 years ago

jmirtsch commented 4 years ago

https://forums.buildingsmart.org/t/ifc-modernization-relationships/1462

I'd like to see many relationships removed, or at least a change remove Unique Id's from them (by changing in inheritence or making that attribute optional).

SergejMuhic commented 4 years ago

I see both as necessary (removing relationships and Unique Ids). In my view, the main difference between relationships should be boiled down to 'one to one', 'one to many' and 'many to many' (i guess three different relationships). The constraints on which IfcObject is referred can be specified in MVDs.

berlotti commented 4 years ago

Good topic!

The objectified relations are introduced as a solution to make it easy to add relations between objects, without having to change the objects. This is a solution to keep as many objects in IFC unchanged, while adding additional relations by adding additional objects. With the modulations and the shift to possible new technologies to maintain IFC, this is the time to reevaluate this solution, and consider having direct relations again.

Looking forward to other input on this!

SergejMuhic commented 4 years ago

If we can find a solution for direct relations, that would be awesome, @jmirtsch Maybe the topic with generics from the MSG forum should also be copied here?

pipauwel commented 4 years ago

There seem to be two issues in one here (too many relationship; many-to-many vs. direct relations). I suggest to split the issue in two.

In my understanding, Relations are currently more important in the current schema than the actual objects - even if the vast majority of the IFC audience first looks at objects, and second at relations. This is very confusing and difficult to explain (e.g. starting developers and students).

I am in favor of changing this. A vast number of relations in IFC (if not all) are designed as many-to-many relationships, thus allowing to link everything to everything using an intermediate IfcRelationShip object. Links point down from IfcRelationship via RelatingObject and RelatedObjects, which are amazingly unspecific and unclear but generic names. However, a big number of Relationships does not need to be modelled as many-to-many relationships (some do). A site has a number of buildings; a building is not on multiple sites.

So this:

    {
       "Class": "Project",
       "GlobalId": "cb78a8c2-fb1e-4e12-8f29-6c0d7c39ca0b",
       "Name": "Default Project",
       "Description": "Description of Default Project"
    },
    {
       "Class": "Site",
       "GlobalId": "f07e69ce-3709-4ef5-a029-e27de7e95991",
       "Name": "'TU/e campus'",
       "Description": "'The High Tech campus of the Eindhoven University of Technology'",
       "CompositionType": ".ELEMENT.",
       "RefElevation": 0
    },
    {
       "Class": "RelAggregates",
       "RelatingObject": {
          "Class": "Project",
          "ref": "cb78a8c2-fb1e-4e12-8f29-6c0d7c39ca0b"
       },
       "RelatedObjects": [
          {
             "Class": "Site",
             "ref": "f07e69ce-3709-4ef5-a029-e27de7e95991"
          }
       ]
    }

can instead be this:

{
      "Class": "Project",
      "GlobalId": "cb78a8c2-fb1e-4e12-8f29-6c0d7c39ca0b",
      "Name": "Default Project",
      "Description": "Description of Default Project",
      "isDecomposedBy": [
        {
          "Class": "Site",
          "GlobalId": "f07e69ce-3709-4ef5-a029-e27de7e95991",
          "Name": "'TU/e campus'",
          "Description": "'The High Tech campus of the Eindhoven University of Technology'",
          "CompositionType": ".ELEMENT.",
          "RefElevation": 0,
        }
      ]
    },

Furthermore, in response to the original issue post; several IfcRelationships are never used. Here are all of them. I suggest removing those which are never used. For example IfcRelReferencedInSpatialStructure? Anyone?

image

jmirtsch commented 4 years ago

I would describe most relationships as one to many, rather than many to many. Ie one material related to multiple instances/types. One attribute is plural, the other singular (ie RelatingObject and RelatedObjects). Retention of many to many relationships is likely (or a better strategy).

The number of relationships should be rationalized. As a note the pending infrastructure extension will make IfcRelReferencedInSpatialStructure more commonly used.

pipauwel commented 4 years ago

One to many relations don't need an intermediate object. They just need an attribute pointing to a LIST or SET (or BAG or ARRAY). So... the RelatingObject and RelatedObjects from a RelSomething object: that is a one-to-many relationship modelled using a many-to-many relationship structure. That is unneeded.

jmirtsch commented 4 years ago

We're in agreement here, I misread your description in the post prior.

SergejMuhic commented 4 years ago

"However, a big number of Relationships does not need to be modelled as many-to-many relationships (some do)." - This. Looks we are in agreement.

HerbertDobernig commented 4 years ago

"I see both as necessary (removing relationships and Unique Ids)."

Is this in contradiction to "To guarantee a future proof IFC, the schema needs to: .... 3. remove circular references and possibly add identifiers to entities that don’t have any now;" https://github.com/buildingSMART/NextGen-IFC/wiki/Towards-a-technology-independent-IFC ?

I am an advocate of identifiers because they allow segmentation of information as used in

HerbertDobernig commented 4 years ago

"A site has a number of buildings; a building is not on multiple sites."

Most relationsships in real life in the final analysis apear to be many-to-many relationships. one-to-one and one-to-many relationships often are (over-)simplification of real life to make life easier for software-development.

Think about two different projects (different owners) on two neighboring sites. The buildings on both sites share a transformer station (or heating station) located across the border between the two sites. Perhaps this is the situation where a building (the station) is on mulitple sites.

I am an advocate of (objectified) many-to-many relationships to sustain flexibility.

pipauwel commented 4 years ago

@HerbertDobernig... with the best of understanding (same discussion happened at length in the W3C LBD group), but...

The most flexible datamodel has things that are related through relationships, have IDs and changeAction metadata. But that data model is also near to empty and leaves everything to an end user (so no standardisation of terms), who will then likely resort to other data models because nothing is really defined and everything is possible. I mean, a Site, Space, Building, Building Storey, are they not all just ´Zones´, related to each other? Or potentially we can simply distinguish between PhysicalThings and NonPhysicalThings and ground this in an upper ontology. And everything can be linked to anything. I am sure you don´t mean to go that extreme, but that is the extreme version of this trend.

And we can´t build any software from it.

to make life easier for software development. I think that IFC should actually aim to allow ´easy´ software development, to enable easy and insightful data exchange between software. I think we should make our choices, based on the data exchanges requested in practice. And yes, we then lose certain flexibility, which is not bad if we want to build software with this data model.

Anyhow, these are very opposing views towards data modelling - probably out of scope here - but to be decided anyway.

pipauwel commented 4 years ago

I am an advocate of identifiers because they allow segmentation of information as used in

  • distributed and federalized CDEs
  • database normalization towards "5 NF"
  • graph databases
  • RDF / OWL
  • declarative programming
  • logic programming

Refer to #26 - I am also an advocate of identifiers (UUIDs, not internal mechanisms) for the same reason.

hlg commented 4 years ago

A lot of the relationships are actually tenary or more and thus can not simply be converted to binary associations. In IFC4 Add2 TC1 this is the case for 15 subtypes of IfcRelationship, hence these 15 have at least one attribute in addition to the relating and related entities. A prominent example is IfcRelSpaceBoundary. In total there are 27 such additional attributes, one of which is mandatory - IfcInterferesElements.ImpliedOrder (Tauscher & Crawford 2018). I don't see, however, why binary n:m relations would necessarily have to be kept objectified.

For example IfcRelReferencedInSpatialStructure? Anyone?

Interestingly you picked one of the many-to-many cases here: This relation is meant to assign elements that span various storeys such as curtain walls, elevator shaft construction, or stairs and ramps for split-levels to more than one storey, while the counterpart IfcRelContainedInSpatialStructure is one-to-many only.

I would not consider the observation how often some concept is used a good indicator of whether it should be removed form the standard. Even if observed in a very large sample, this may be due to factors outside of the data model as such, for instance if the concept is in the certification MVD and thus overlooked by implementers. Rather before suggesting removal of any particular relationship, I believe it should be looked at and its semantics fully understood. Also note that the relations are not at all generic despite the systematic naming of the relatingXXX and relatedYYY where XXX and YYY are derived from the entity types that this relationship is supposed to relate.

I appreciate the clear destinction of which relationships are objectified and which are not in current IFC. If we would de-objectify some of the relationships outside of the resource layer (that is between globally identifiable entities) and leave others objectified (those with additional attributes) - that may cause more confusion and pain upon implementation.

pipauwel commented 4 years ago

If the proposal to maintain IfCOwnerHistory in #16 is followed, then this entire objectification discussion is obsolete. If I am not mistaken (typing on a mobile device), each IfcRelationship is linked to an IfcOwnerHistory object, thus requiring it to be objectified.

In my opinion, buildingSMART should simply be more clear whether it takes a fully objectified and late binding approach (see also #28), which I think is the case, or not, and then simply stick to it.

Consequences in XML, JSON, and RDF are then what they are, and that is just it (see technology independence).

HerbertDobernig commented 4 years ago

@pipauwel I would like to better understand your arguments and your viewpoint. What consequences in XML, JSON, and RDF do you expect?

HerbertDobernig commented 4 years ago

"Objectified relationships are the preferred way to handle relationships among objects. This allows to keep relationship specific properties directly at the relationship and opens the possibility to later handle relationship specific behavior."

https://standards.buildingsmart.org/IFC/DEV/IFC4_2/FINAL/HTML/schema/ifckernel/lexical/ifcrelationship.htm

This is a clear and argued statement but apparently under discussion. I am interested in reading all the pros and cons discussed here.

HerbertDobernig commented 4 years ago

"In my understanding, Relations are currently more important in the current schema than the actual objects - even if the vast majority of the IFC audience first looks at objects, and second at relations. This is very confusing and difficult to explain (e.g. starting developers and students)." (statement from @pipauwel)

I agree that relations became more important than object attributes. And yes, I had my difficulties with this "ambivalence" as I started to learn IFC. Then I tried to write IFC files manually and to check what attributes are needed at minimum to get my testcase-building (4storeys-16walls-64windows) imported into three common authoring software programs. It was disillusioning to end up with a vast number of $ characters (undefined attributes).

To me, this obvious shift from object (or class) attributes towards relations, is an indicator for the inadequacy of a strict classification system if IFC shall advance from pure BIM-CAD exchange to an universal exchange specification covering many different aspects of the supply chain in construction industry.

To my mind, standardization shall focus on defining default and best-practice relationships based on construction domain expert consensus to avoid uncontrolled generation of proprietary relationships.

pipauwel commented 4 years ago

@pipauwel I would like to better understand your arguments and your viewpoint. What consequences in XML, JSON, and RDF do you expect?

I don't want to go into too much opinionating. I think this thread needs to return to the original request by @jmirtsch, namely to remove many relationships and their IDs (not sure why).

But okay... my arguments and viewpoint: In trying to model many things in a single system, and allowing as much flexibility as possible, more and more objectification typically tends to happen, This is probably particulary recognizable in SQL databases (mapping tables for many to many relations; attribute or extra table). This is only normal and justified.

Yet, it also has some effects on the data model itself, if done too extremely:

  1. The actual data ('wall', 'space', 'thermal resistance', ...) gradually gets pushed out of the schema (late binding), leading to a schema about 'things', 'properties' and 'metadata', while standardisation of terminology and structure about the construction and infrastructure disappears (the leaves of the data model move further and further away). I think that standardisation of that content is also needed, in any case.
  2. Complexity scares away the more common user (developer).

I think that the above is what IFC should be, incl. then those two risks. It is its core use case: full file-based data exchange between two systems ('interoperability'). And majority of IFC users require precisely this (via vendors).

Main effect for XML and JSON: Many to many relationships everywhere are typically handled in XML and JSON by (1) avoiding them, (2) deleting them, or (3) relying on an ID/IDREF mechanism. If all relations are objectified, then it is almost impossible to aim for a clean tree structure in XML and JSON (like gbXML). With this extreme objectification of relations and data, all data just needs to be listed in a flat tree and everything is linked using an ID mechanism (which I think should then in any case be UUIDs for the sake of transactional exchanges and distributed data). See also #26. An example such XML file can then be found in https://pi.pauwel.be/files/randomProject_IFC2X3CV.ifcxml. I don't think this is usable?

If then, on top of that, all those objectified things (e.g. relationships and ownerhistory as prime examples) are always left empty and not used, then this seems to be a big price paid for little added value really.

XML and JSON are primarily used for transactional exchanges. It seems to me that there would be a preference in such approaches, by those developers, to exchange data very concretely and directly, in small snippets, rather than using very large files full of objectified relationships with local IDs. So I think that transactional data exchange (the 2nd use case of IFC, besides the main one mentioned above) needs an IFC with less objectification.

Main effect on RDF The problem is a lot less significant here. The RDF graph-based data model is conceptually a lot like EXPRESS/SPF. Also in an RDF world, there are proponents and opponents of objectified relationships. Overall implications can probably be seen in the below 3 files (increasing objectification, with the last one using the IFC ontology in OWL). I am personally more inclined to use the first file - especially in transactional exchanges (less needed in an RDF world, of course).

Repeating: I don't want to go into too much opinionating. I think this thread needs to return to the original request by @jmirtsch, namely to remove many relationships and their IDs.

HerbertDobernig commented 4 years ago

@pipauwel Thx for your explanations. This will help me for further discussions outside this thread.

EAzari commented 4 years ago

This is the second time I mention this: "First of all, you have to choose a language, UML? SysML? ..." The majority of them (also IFC) are based on Entity/Class, Relationship, and Attribute/Property (and attribute on relationship) (ERA) which even OWL/RDF follow the same structure

Personally I think UML has limitations, this is why today STEP expands based on SysML, however, SysML is advanced and if you today don't choose it, the industry will choose it among other available solutions, especially for automation and control purposes

Also, in the bSI forum, I maintained 5WH model which usually is a base for enterprise architecture

hlg commented 4 years ago

Let me point out another aspect of direct versus objectified relationships. This is about extensibility at design time and modularity at runtime. The question is: can we add relationships to a single module without affecting independent modules? Note I am talking about an extension that takes place in the schema at design time, not at runtime, thus there is no need for a late-binding implementation. Yet, we may want to keep instances modular (at runtime). I hope the following example can illustrate this.

We have an independent base module 0 with two types A and B and two dependent modules 1 and 2 (which are mutually independent). Module 1 adds some relationship C between types A and B. Module 2 adds a type D with an association to B, but that's not really relevant. Now, there are two ways to model the relationship in module 1: the "direct" way (1a, left side of image) and the "objectified" way (1b, right side of image).

// module 0
ENTITY A; END_ENTITY;
ENTITY B; END_ENTITY; 

// module 1a
ENTITY Bx SUBTYPE OF (B);
  c: A;
END_ENTITY;

// module 1b
ENTITY C;
  c1: A;
  c2: B;
END_ENTITY;

// module 2
ENTITY D;
  b: B;
END_ENTITY;

20200516_inheritance-composition_xs

Now consider two applications 1 and 2, which both implement module 0 and either module 1 or module 2 as an extension. All fine and mellow on the schema level. But when it comes to instances and data exchange, the modularity may appear broken.

Let's look at an instance produced by application 1 with the schema variant 1a:

#1=A();
#2=Bx(#1);

and with schema variant 1b:

#1=A();
#2=B();
#3=C(#1,#2);

Notice that variant 1b produces a modular instance, 1a does not. Application 2 could consume both entity instances of the base module's A and B type for variant 1b, but not for variant 1a.

If you wonder about my definition of modularity: Modules are partitions (here: of the schema) that exhibit some defined notion of independence amongst them, hence you can take away or change particular modules without affecting others. If change in or removal of one module affects another module, then the second is depending on the first. When it comes to data exchange we could extend that definition to the instance level: If a software application requires knowledge about one module to process instances of another module, then the latter module depends on the first. Inheritance proves to be tricky in this regard.

Three more side notes: I have used EXPRESS and P21 to notate sample schema and instances for their compactness and brevity, but this applies to other modelling languages and implementation methods equally. This issue is not restricted to objectified relationships, but they are an obvious manifestation. We had a similar discussion and made a controversial design choice when extending CityGML with an ADE, modelling was done in UML and the implementation method was XML.

TLiebich commented 4 years ago

thanks @hlg this was exactly one of two reasons, why objectified relationships had been introduced to IFC in the first place - to allow for a modular implementation. IFC was meant to be based on modules (hence the IFC schema architecture) that where implemented using the EXPRESS SCHEMA concept (a module in UML terms). But when it came to implementation in software, the (at that time) easier way to combine everything into a single longform schema was chosen.

But now the second reason for having chosen objectified relationships. It was connected to the original MVD idea of defining valid subsets of the total EXPRESS longform that need to be implemented for reaching a particular conformance level. Those subsets should be minimal, in order to minimize implementation efforts. Objectified relationships were seen as the way to create such subsets by minimizing the dependencies among the classes.

Lets take the example from @hlg further by adding one subtype to B that adds an additional attribute n.

// module 1a
ENTITY Bxn SUBTYPE OF (Bx)
  n : STRING;
END_ENTITY;

// module 1b
ENTITY Bn SUBTYPE OF (B)
 n : STRING;
END_ENTITY;

The implementation subset (of the MVD) shall be for brevity only B(x)n, it would be created

// module 1a
REFERENCE FROM 1a
 (Bxn);

// moduel 1b
REFERENCE FROM 1b
 (Bn);

the difference is now, that the generated sub schemas contains:

leading to a much smaller sub schema, since the relationships are only included, if explicitly referenced.