Open aothms opened 2 years ago
Okay, here's a small update.
In https://github.com/Krande/ifcdb/commit/d7fadfe5da71e7ce074b406c4f0e291ceb5d8e63 I have successfully managed to modify my partial EXPRESS generator example to use the baked in ifcopenshell schema definitions. In the tesselated-item.ifc
test-case I chose as a starting point I successfully managed to upload an IfcBuildingElementProxy
element to the EdgeDB instance server and then perform a query and return data (which I managed with the original python code).
I see that all the data I needed is already covered in the ifcopenshell schema, so this saves me a ton of time!
Regarding select types
I managed to port to ESDL as basic Unions
such as
type IfcUnitAssignment {
required multi link Units -> IfcDerivedUnit | IfcMonetaryUnit | IfcNamedUnit;
}
However I see that for Unions of object types and base types (such as floats) I run into issues.
type IfcCartesianPoint extending IfcPoint {
required property Coordinates -> tuple<float32, float32, float32>;
}
type IfcTrimmedCurve extending IfcBoundedCurve {
required link BasisCurve -> IfcCurve;
required multi link Trim1 -> IfcCartesianPoint | float32;
required multi link Trim2 -> IfcCartesianPoint | float32;
required property SenseAgreement -> bool;
required property MasterRepresentation -> IfcTrimmingPreference;
}
Basically properties pointing to objects are assigned with link
while base types are assigned with property
.
I could solve this by instead of converting IfcParameterValue
object to float
in the generator instead create a IfcParameterValue
object then assign that object a value
property of float
type. This might also make more sense given that this would follow the IFC Schema defined in the EXPRESS file a bit closer.
Nonetheless, I've asked a question on Unions of base types and objects on the EdgeDB discord "help" channel. If this is prohibited then so be it. For now I might just change the generator so that it does not automatically convert IfcParameterValue
and other singular property assignment types directly to a base type representation. However I'll leave array data as floats.
I'll keep you posted on the progress here.
BTW: Toposort was a great suggestion!
I could solve this by instead of converting IfcParameterValue object to float in the generator instead create a IfcParameterValue object then assign that object a value property of float type. This might also make more sense given that this would follow the IFC Schema defined in the EXPRESS file a bit closer.
I think ultimately, this would be best. There is also the IfcPropertySingleValue.NominalValue attribute where the type IfcValue is a huge select including multiple defined types with the same underlying type, e.g IfcLabel and IfcIdentifier are both possible for IfcValue and are both string. So in order to be a reversible serialization you need to decorate the value with the type somehow.
(In HDF5 defined types are also more or less eliminated in the serialization. The HDF5 mapping differentiates between this and calls these ambiguous selects, in the IfcTrimmingSelect case above you can deduce the defined type from the concrete data type (float means IfcParamValue, instance means IfcCartesianPoint) so the defined type doesn't have to be annotated. However in the case of IfcValue you can't deduce the defined type from the value type, so there they provide an additional string component then in the select definition to provide the type name as a string).
The question here is the overhead though. There are many many defined types. What's the consequence in EdgeDB? Are they all ending up as separate tables? That might result into fragmentation and I'm not sure to what extent that is a problem. If fragmentation is an issue I imagine you could also fold all the defined types of same value type into a single type definition, like:
scalar type IfcDefinedFloatTypes extending enum<IfcLengthMeasure, IfcParameterValue, ...>;
type IfcDefinedTypeFloat {
required property Type -> IfcDefinedFloatTypes;
required property Value -> float32;
}
However I'll leave array data as floats.
Intuitively I think that'd be the right approach.
required multi link
Is this an ordered collection? Keep in mind that in Express we have SET (unordered) and LIST (ordered) - and BAG (not used) ARRAY (ordered, fixed size).
I modified the generator to convert IfcParameterValue
(and similar types) to native EdgeDB object types. The IFC schema elements (and related pointers) I've tried to convert so far are accepted by the EdgeDB type interpreter (I guess we'll find out if the type casting is correct once we start roundtrip testing!)
Regarding multi link
, I believe it is an ordered collection (ref https://github.com/edgedb/edgedb/discussions/3271?sort=old).
I did do an initial stress test on an edgedb instance running locally in a docker container and I did at least manage to upload 5000+ IFCBuildingElementProxy elements containing a lot of IfcTriangulatedFaceSet geometry. The source IFC file was approximately 500mb, but a lot of that is just property attributes (which I haven't added yet).
My initial reaction is that it handles the data size just fine, but as far as I can tell it isn't designed for single threaded IO speeds. So I ended up with having to do data insertion across multiple threads. That way the upload time was reduced to a few minutes (and can probably be done more efficient).
I have also gotten tips on how to set up github actions to do unittesting on edgedb. So if edgedb proves to fit with our needs it is possible to set up unittesting of different roundtrips and queries/inserts.
I'll keep you posted on the progress!
When you say that the complete representation is there except where
rules. Does that include the unique
property as well? I can't immediately find a is_unique
property or similar available on Attributes
Sorry for the late reply.
Does that include the unique property as well?
Not in the internalized data, but we do have access to it at the parse time (before compilation):
>>> import ifcopenshell.express
>>> m = ifcopenshell.express.express_parser.parse('IFC2X3_TC1.exp')
>>> m.schema.entities['IfcRoot'].unique
[('UR1', 'GlobalId')]
^ note the difference here between ifcopenshell.express.express_parser.parse and ifcopenshell.express.parse. The first returns the raw parse data from disk. The second returns a latebound schema definition that you can use to create models.
We could add it. It'd require changing the entity definition in our latebound schema definition
https://github.com/IfcOpenShell/IfcOpenShell/blob/v0.7.0/src/ifcparse/IfcSchema.h#L245
And then populate that in:
The first creates schema definitions at runtime using the classes wrapped in swig. The second creates C++ source code for the built in schemas.
I'm personally not convinced it's worth the effort. Would you really enforce the unique constraint in your database? That seems too restrictive wrt to models coming with duplicate ids or some scenarios where you have multiple versions of the same model in the same db.
I agree. I will not give uniqueness constraints any priority.
For the moment I am exploring how I should model union properties that can be either floats/booleans/strings. Take the ListValues
property on the IfcIrregularTimeSeriesValue
object as an example.
Currently EdgeDB doesn't like whenever a schema object property allows union base types (floats/bools/etc..).
Here's a simplified version of how it would look in EdgeDB schema:
module default {
type IfcAbsorbedDoseMeasure {
required property value -> float64;
}
type IfcBoolean {
required property value -> bool;
}
type IfcValue {
required property value -> IfcAbsorbedDoseMeasure | IfcBoolean;
}
type IfcIrregularTimeSeriesValue {
required property TimeStamp -> str;
required multi link ListValues -> IfcValue;
}
}
I have raised a question on discord for tips on how I might circumvent it: https://discord.com/channels/841451783728529451/849377751370432573/981203479425069136
But for now my plan would be to separate the IfcValue properties into separate properties per base type i.e.
type IfcValue {
required property valuefloats -> IfcAbsorbedDoseMeasure | more float types here etc.. ;
required property valuebools -> IfcBoolean | more boolean types here etc.. ;
}
Any thoughts?
separate properties per base type
That's more or less what the HDF5 serialization also does (but only the primitives and then real type as a string). In your case you'd have an integer specifying which field is actually used? It's unfortunate, but ultimately, this is often what happens. We're dealing with strictly typed column types after all in postgres, so I can understand some of these limitations leak through.
Keep in mind though that IfcValue also has some "bigger" value types such as IfcComplexNumber (array size 2) and IfcCompoundPlaneAngleMeasure (list of integer length 3 or 4). I guess you could cheat by leaving them out, but it's also unfortunate to introduce limitations so early on in the game.
But then isn't it possible to wrap every defined ifc type into a edgedb class, so that you're not dealing with base types directly?
The other option is just to store everything as string in this case. In the form proposed above you can barely use it directly in queries anyway, so what not store it as a combination of string and datatype?
But then isn't it possible to wrap every defined ifc type into a edgedb class, so that you're not dealing with base types directly?
Yes, this is something I hope I could get some pointers on from the EdgeDB help channel. I originally thought I already had fixed this with my implementation by wrapping the base types (like IfcAbsorbedDoseMeasure
and IfcBoolean
does in the example above). But I am sure that this ought to be possible, so I'll keep at it :)
I'll let you know how it goes!
Okay, I think I solved this now. I somehow forgot that I have define object properties with "link"
as opposed to "property"
. In addition union of different objects with different base types should not use the same internal value assignment name value
.
The following works at least!
module default {
type IfcAbsorbedDoseMeasure {
required property IfcAbsorbedDoseMeasure -> float64;
}
type IfcBoolean {
required property IfcBoolean -> bool;
}
type IfcValue {
link value -> IfcAbsorbedDoseMeasure | IfcBoolean;
}
type IfcIrregularTimeSeriesValue {
required property TimeStamp -> str;
multi link ListValues -> IfcValue;
}
}
Now this should be fixed (or at least I've found other issues in need of fixing!).
What I am seeing now is issues with circular dependencies:
Toposort is giving me the following error:
Circular dependencies exist among these items:
{
'IfcFillAreaStyle': {
'IfcFillStyleSelect'
},
'IfcFillAreaStyleTiles': {
'IfcStyledItem'
},
'IfcFillStyleSelect': {
'IfcFillAreaStyleTiles'
},
'IfcPresentationStyleAssignment': {
'IfcPresentationStyleSelect'
},
'IfcPresentationStyleSelect': {
'IfcFillAreaStyle'
},
'IfcStyleAssignmentSelect': {
'IfcPresentationStyleAssignment'
},
'IfcStyledItem': {
'IfcStyleAssignmentSelect'
}
}
I believe this is related to IfcFillAreaStyleTiles
pointing back to IfcStyledItem
.
ENTITY IfcFillAreaStyleTiles
SUBTYPE OF (IfcGeometricRepresentationItem);
TilingPattern : LIST [2:2] OF IfcVector;
Tiles : SET [1:?] OF IfcStyledItem;
TilingScale : IfcPositiveRatioMeasure;
END_ENTITY;
Does it make sense to treat the property Tiles
so that it isn't assigned a list of IfcStyledItem
. Its more of a computed property that returns all the IfcStyledItem
items it indirectly belongs to?
Have I done something wrong in my building of relationships or is this property by nature a circular dependency?
Hm, interesting. I wasn't aware of this.
It seems to be a change in IFC4, where they changed the Tiles attribute from a single select of IfcFillAreaStyleTileSymbolWithStyle to IfcStyledItem.
I think the first thing to verify is how many of these issues exist. It could be toposort just broke on the first issue? If you remove the Tiles
attribute altogether, do you run into other cycles?
If this is the only one I think this is a bug that needs to be addressed (in 4.4 then probably) and I would be inclined to just skip this entity or attribute in your edge db mapping. I'm sure nobody has ever thought about supporting this in software anyway.
If there are more issues like this we might have to come up with a solution to break the cycles.
I found in total 3 separate cases of circular dependencies related to the classes IfcFillAreaStyleTiles
(as shown above), IfcClassificationReference
and IfcBooleanResult
. I made a small patch in my code to weed out the necessary properties from the classes in question:
The following code got rid of the circular dependencies error in toposort
def get_attributes(self) -> list[AttributeEdgeModel]:
circular_refs = [
("IfcFillAreaStyleTiles", "Tiles"),
("IfcClassificationReference", "ReferencedSource"),
("IfcBooleanResult", "FirstOperand"),
("IfcBooleanResult", "SecondOperand"),
]
if self._attributes is None:
atts = []
for att in self.entity.attributes():
att_name = att.name()
should_skip = False
for circ_class, circ_att in circular_refs:
if self.name == circ_class and att_name == circ_att:
should_skip = True
break
if should_skip:
continue
atts.append(AttributeEdgeModel(self.edge_model, att))
self._attributes = atts
return self._attributes
I'll keep you posted on the continued progress!
Ok, these are definitely valid, useful and used in models
("IfcClassificationReference", "ReferencedSource"),
("IfcBooleanResult", "FirstOperand"),
("IfcBooleanResult", "SecondOperand")
What is precisely the limitation actually with cycles in edgedb?
I am not 100% sure, but I believe EdgeDB relies on the order of objects in the ESDL schema. But I will check if that is the case for all types.
For reference the reported chain of circular dependencies reported by toposort are for the remaining 2 classes:
IfcClassificationReference
{
'IfcClassificationReference': {
'IfcClassificationReferenceSelect'
},
'IfcClassificationReferenceSelect': {
'IfcClassificationReference'
},
'IfcClassificationSelect': {
'IfcClassificationReference'
},
'IfcMaterialClassificationRelationship': {
'IfcClassificationSelect'
},
'IfcRelAssociatesClassification': {
'IfcClassificationSelect'
}
}
IfcBooleanResult
{
'IfcBooleanClippingResult': {
'IfcBooleanResult'
},
'IfcBooleanOperand': {
'IfcBooleanResult'
},
'IfcBooleanResult': {
'IfcBooleanOperand'
},
'IfcCsgSelect': {
'IfcBooleanResult'
},
'IfcCsgSolid': {
'IfcCsgSelect'
}
}
In C++ this problem is solved using "forward declarations". Maybe something similar exists in edgedb?
Declares a class type which will be defined later in this scope. Until the definition appears, this class name has incomplete type. This allows classes that refer to each other:
https://en.cppreference.com/w/cpp/language/class
Otherwise maybe you'd have to workaround it using something like:
# Cyclic
type IfcBooleanOperand = IfcBooleanResult | IfcSolidModel | ...
class IfcBooleanResult
FirstOperand -> IfcBooleanOperand
SecondOperand -> IfcBooleanOperand
# Acyclic
class IfcBooleanResultHelper
type IfcBooleanOperand = IfcBooleanResultHelper | IfcSolidModel | ...
class IfcBooleanResult extends IfcBooleanResultHelper
FirstOperand -> IfcBooleanOperand
SecondOperand -> IfcBooleanOperand
^ i.e by creating an intermediate empty type that is referenced by the select and concrete type that inherits from it.
Okay here is another update. I think I have solved most of the base type issues now.
So today I tried to upload the entire IFC schema into an edgedb instance. It no longer produces any errors, but it seems to hang for some reason, so I've raised an issue https://github.com/edgedb/edgedb/issues/3946 to get some help on it.
While I am getting help on that front, tomorrow I will be tackling the circular dependencies we have been discussing. My hope is that I will be able to include all properties of all classes with minor modifications.
I'll keep you updated on the progress!
Minor update. Circular dependencies are not an issue. Just tested now and EdgeDB does not mind the order of objects. I was successful in uploading directly all above mentioned classes (and their dependencies) related to circular dependencies in an arbitrary order.
So I will move onto IFC upload strategy tomorrow. I have gathered some thoughts on how I might pursue this. Let me know if you would recommend different strategies (I welcome any suggestion!).
By the way. Do you have a proposal for which type (and perhaps also in which order) of elements I would need to iterate over using ifcopenshell in order to cover all relevant IFC elements? I would like to see if I can roundtrip an entire (very small mind you) IFC file to/from edgedb.
I guess IfcProduct
gets me all the physical objects and spatial hierarchy? But what about project information and property sets?
As a starting point for further discussion, I was planning on inserting IFC data into edgedb in roughly the following order:
In the IFC files I encounter the most, the largest number of elements are the physical
elements. So I thought I would start by uploading those early on using multiple threads. Then upload spatial hierarchy and property sets once all physical objects are finished uploading.
By the way. Do you have a proposal for which type (and perhaps also in which order) of elements I would need to iterate over using ifcopenshell in order to cover all relevant IFC elements?
Somewhat strangely this doesn't exist in IFC. IFC is a forest. There is the IfcRoot tree, but outside of that most of the classes are actually "roots" in the sense that they don't have a supertype. You can see that in the inheritance listing http://ifc43-docs.standards.buildingsmart.org/IFC/RELEASE/IFC4x3/HTML/annex-c.html
So if you have a file f = ifcopenshell.open('name.ifc')
you can simply iterate over it with for inst in f
.
Not sure what's the reason behind your desired insertion order. You could say something like:
entity_priority = {'IfcBuildingElement': 1, 'IfcSpatialStructureElement': 2, 'IfcPropertyDefinition': 3}
for inst in sorted(f, key=lambda x: min((v for k, v in entity_priority.items() if inst.is_a(k)), default=1000)
...
But - same as above - due to the fact that a lot of entities are root-level definitions there basically isn't a meaningful compartimentalization of the schema by means of inheritance. There are the domains, but they are not available in ifcopenshell and also wouldn't be exactly what you need probably. So you'd end up with a very long list to define your order.
Oh, I wasn't aware that I could just iterate directly on the file object!
My reason behind the insertion order was loosely based on how I was expecting to do linking of objects. I was assuming I need to have uploaded the physical objects with GlobalID's before adding spatial elements that has a children property with [spatial + physical elements].
Am I overthinking this perhaps?
Well, keep in mind that most links in IFC are objectified relationships, instances of classes that establish a reference to both sides, often 1 to many. e.g IfcRelContainedInSpatialStructure
.
If you need thinks to exist in the DB before linking to it (makes sense, I didn't think about it) then it might be useful to do toposorting again:
Iterate over the file. And for every inst file.traverse(inst, max_levels=1)[1:]
that will give you for every inst it's dependencies. And that you can then sort topologically.
It will start with all the cartesian points for example (because these are only numbers and never references) and ends with the objectified relationships.
(and then pray there are no cycles, but generally there shouldn't be in the model I think)
Great tip!
I tried with one of the IFC example files tessellated-item.ifc
and I think I have sorted all the IFC elements:
dep_map = dict()
for inst in self.ifc_obj:
if inst.id() not in dep_map.keys():
dep_map[inst.id()] = []
for dep in self.ifc_obj.traverse(inst, max_levels=1)[1:]:
dep_map[inst.id()].append(dep.id())
return [self.ifc_obj.by_id(x) for x in toposort_flatten(dep_map, sort=True) if x != 0]
it appears that a ID=0 was found (which doesn't exist in the file). I just skipped it for now. Any thoughts what it could point to?
Which generates
I might however have to insert certain objects together as the numeric ID identifier will break once I add IFC content from multiple IFC files. So for example I think I will have to insert the IfcBuildingElementProxy
, IfcShapeRepresentation
& IfcTriangulatedFaceSet
in a nested insert statement.
Or maybe I can assign each individual object to a temporary unique ID that get's resolved during the insert itself.
Nonetheless, I'll test this asap and see what I can come up with!
it appears that a ID=0 was found (which doesn't exist in the file). I just skipped it for now. Any thoughts what it could point to?
Yes, you can filter those out, they are the defined types used in selects, such as:
#21=IFCPROPERTYSINGLEVALUE('IsExternal',$,IFCBOOLEAN(.F.),$);
^^^^^^^^^^^^^^^
Or maybe I can assign each individual object to a temporary unique ID that get's resolved during the insert itself.
Didn't think of that either. Yes, I'd definitely wouldn't rely on the instance numbering from the spf files. EdgeDB must have something though to uniquely identify an instance. After all, what's written in the postgres table for the instance links? Maybe there is an API to get that number. Similar to e.g mysql LAST_INSERT_ID()
in an autoincrement table.
When you go back from EdgeDB to SPF it's probably easiest to let IfcOpenShell renumber the instances. And then you can arbitrarily create subsets or aggregates of various models. :)
I know all objects are assigned a unique uuid
. And I am pretty sure that it's possible to return this as a result of any insert execution. I am hopeful that I will be able to upload the tessellated-item.ifc
by end of this week. Hopefully by end of tomorrow!
The autogenerated insert statements seems to be working! But I quickly ran into a minor snag on the Dimensions
property on the IfcSiUnit
class.
The Dimensions
property is None
but is still required according to the EXPRESS schema. However, it is mentioned that this property is possible to derive
.
From the EXPRESS file:
ENTITY IfcSIUnit
SUBTYPE OF (IfcNamedUnit);
Prefix : OPTIONAL IfcSIPrefix;
Name : IfcSIUnitName;
DERIVE
SELF\IfcNamedUnit.Dimensions : IfcDimensionalExponents := IfcDimensionsForSiUnit (SELF.Name);
END_ENTITY;
FYI: I see that there are 85 occurrences of DERIVE
in the IFC4X1.exp
EXPRESS file.
Any suggestion on how I should tackle this?
For now I will use the derived()
method, and basically make the properties that are not explicitly set and possible to derive as OPTIONAL. I haven't checked if all "derivable" properties have all the necessary properties to be derived downstream. But that is probably not something the database should worry about?
Update: I think I found what I was looking for. The derived()
method on the entity object specifying which attribute is derived
.
Let me know what you think?
This is one of the most annoying bits in the schema.
What we're talking about here is not just a derived attribute like
ENTITY IfcCartesianPoint
SUBTYPE OF (IfcPoint);
Coordinates : LIST [1:3] OF IfcLengthMeasure;
DERIVE
Dim : IfcDimensionCount := HIINDEX(Coordinates);
^^^
WHERE
CP2Dor3D : HIINDEX(Coordinates) >= 2;
END_ENTITY;
Where there is a newly introduced derived attribute defined.
The issue here is "Explicit attributes redeclared as derived in a subtype". Express allows other types of redeclarations as well, but they are not used in the IFC schema.
Looking at the serialization, the subclass is not compatible anymore with the supertype, because there is no value for Dimensions (because it is derived using a formula), but IfcNamedUnit.Dimensions is not optional. In STF the value for such a redeclared derived attribute always needs to be *
. There are only a handful of these cases.
ENTITY IfcNamedUnit
ABSTRACT SUPERTYPE OF(ONEOF(IfcContextDependentUnit, IfcConversionBasedUnit, IfcSIUnit));
Dimensions : IfcDimensionalExponents;
^^^^^^^^^^
UnitType : IfcUnitEnum;
WHERE
WR1 : IfcCorrectDimensions (SELF.UnitType, SELF.Dimensions);
END_ENTITY;
ENTITY IfcSIUnit
SUBTYPE OF (IfcNamedUnit);
Prefix : OPTIONAL IfcSIPrefix;
Name : IfcSIUnitName;
DERIVE
SELF\IfcNamedUnit.Dimensions : IfcDimensionalExponents := IfcDimensionsForSiUnit (SELF.Name);
^^^^^^^^^^
END_ENTITY;
This is tricky and causes quite a bit of implementation overhead in IfcOpenShell as well.
In IfcOpenShell information on redeclared derived is available:
>>> import ifcopenshell
>>> w = ifcopenshell.ifcopenshell_wrapper
>>> schema = w.schema_by_name('ifc2x3')
>>> schema.declaration_by_name('IfcSIUnit').derived()
(True, False, False, False)
>>> len(schema.declaration_by_name('IfcSIUnit').all_attributes())
4
>>> len(schema.declaration_by_name('IfcSIUnit').attributes())
2
IfcSIUnit only has 2 non-inherited attributes, but the length of the derived boolean list is 4, because it specifically includes the inherited attributes. In this way you can detect Dimensions (1st attribute) is inherited in IfcSIUnit.
I think the easiest solution is to detect this and alter the definition of IfcNamedUnit to make Dimensions simply optional. Then it's directly compatible with the data that the IfcOpenShell entity_instance will give you.
Edit: just realized we arrived at the exact same conclusion. Great :)
Okay, here's another update!
In https://github.com/Krande/ifcdb/commit/bb1b61a0fdceb57ca1de5777326ef8cd126a9e9b I finally managed to successfully upload the tesselated-item.ifc
example file using autogenerated insert statements into a EdgeDB instance with a partial IFC schema.
I used toposort and a "uuid map"
which I used to insert objects and their dependencies. An example of auto-generated insert statement which inserted the element
#25=IfcRelContainedInSpatialStructure('2TnxZkTXT08eDuMuhUUFNy',$,'Physical model',$,(#22),#23)
is
WITH
ifc_22 := (SELECT IfcBuildingElementProxy filter .id = <uuid>"4832d3ba-eaf0-11ec-b780-7367b0bb24db"),
ifc_23 := (SELECT IfcBuilding filter .id = <uuid>"47614da4-eaf0-11ec-b780-57b0f1b46ff7"),
SELECT (INSERT IfcRelContainedInSpatialStructure {
GlobalId := '2TnxZkTXT08eDuMuhUUFNy',
Name := 'Physical model',
RelatedElements := {ifc_22,},
RelatingStructure := ifc_23
}
)
The SELECT
wrapping the insert statements returns the uuid
of the generated object.
It might not be a very efficient insert method as of yet, but I am slowly getting a better handle on the IFC schema and EdgeDB. So I am optimistic about the potential for improving both performance and stability on the upload/insert of IFC content into EdgeDB.
Regarding work on uploading the entire IFC Schema
into EdgeDB
it seems that the EdgeDB
folks are close to issueing a PR that will address the issue I had with a very slow initial migration in https://github.com/edgedb/edgedb-cli/pull/744.
Still no updates on the second issue I mentioned in https://github.com/edgedb/edgedb/issues/3946 regarding migrate
generating a out of memory
error presumably caused by the string input generated by EdgeDB exceeding the memory limits of postgres itself. But as mentioned in the reply by the EdgeDB folks, it seems fixable.
So today I will start the work on creating queries to validate that the uploaded items are in fact correct by attempting to roundtrip the entire file.
I'll keep you posted on the progress! :)
Okay, good news!
I just now completed the first successful roundtrip of an IFC file to/from an EdgeDB database.
The following python code snippet does the following
import os
from ifcdb import EdgeIO
from ifcdb.utils import top_dir
ifc_f = top_dir() / "files/tessellated-item.ifc"
db_schema_dir = "db/dbschema"
with EdgeIO(ifc_file=ifc_f, schema_name="IFC4x1", database="testdb") as io:
io.create_schema_from_ifc_file(db_schema_dir)
io.setup_database(db_schema_dir)
io.insert_ifc()
res = io.export_ifc_elements_to_ifc_str()
os.makedirs("temp", exist_ok=True)
with open("temp/tessellated-item-roundtripped.ifc", "w") as f:
f.write(res)
I have opened the generated IFC file in Blender and just by comparing the contents it seems to have worked!
I have also finally got my github actions workflow for unittesting edgedb schema roundtripping up and running.
Right now I am in the process of writing a unittest using pytest that performs an element-by-element verification to ensure that all elements and properties have been successfully transferred.
After that I guess I'll start throwing more IFC files at it to see where the EdgeIO code needs more attention :)
Wonderful! š
If there are parts you want to have take a look at wrt to IFC or IfcOpenShell internals then let me know (including maybe some tricks to eliminate hardcoded stuff if they're there).
One more thing to take a look at is b-splines https://standards.buildingsmart.org/IFC/RELEASE/IFC4_1/FINAL/HTML/link/cube-advanced-brep.htm because I think it's the only case of nested lists in the schema. It may need something of an intermediate class like sketched below. Even in UML we couldn't represent this nicely.
type List_of_IfcCartesianPoint {
required multi link elements -> IfcCartesianPoint;
}
type IfcBSplineSurfaceWithKnots {
...
required multi link ControlPointsList -> List_of_IfcCartesianPoint;
...
}
Sure,
I can use the cube-advanced-brep.ifc as my next IFC test-file
.
I just tried it and I immediately see that I need to address some more basic stuff, for example tuples of varying length in IfcDirection
as an example :)
type IfcDirection extending IfcGeometricRepresentationItem {
required property DirectionRatios -> tuple<float64, float64, float64> {
};
}
I will need to find how I can allow DirectionRatios
(and other similar scenarios) to be a union of tuple<float64, float64, float64>
or tuple<float64, float64>
. I tried with a basic |
operator, but that wasn't allowed. I'll ask this question in the EdgeDB discord chat, but for now I am tempted to just create two properties DirectionRatios_2
for length=2 and DirectionRatios_3
for length of 3.
Regarding ControlPointsList
on IfcBSplineSurface
. I see that it generates the following:
abstract type IfcBSplineSurface extending IfcBoundedSurface {
required property UDegree -> int64;
required property VDegree -> int64;
required multi link ControlPointsList -> IfcCartesianPoint;
required property SurfaceForm -> str {
constraint one_of ('CONICAL_SURF','CYLINDRICAL_SURF','GENERALISED_CONE','PLANE_SURF','QUADRIC_SURF','RULED_SURF','SPHERICAL_SURF','SURF_OF_LINEAR_EXTRUSION','SURF_OF_REVOLUTION','TOROIDAL_SURF','UNSPECIFIED');
};
required property UClosed -> bool;
required property VClosed -> bool;
required property SelfIntersect -> bool;
}
type IfcBSplineSurfaceWithKnots extending IfcBSplineSurface {
required property UMultiplicities -> tuple<int64, int64>;
required property VMultiplicities -> tuple<int64, int64>;
required property UKnots -> tuple<float64, float64>;
required property VKnots -> tuple<float64, float64>;
required property KnotSpec -> str {
constraint one_of ('PIECEWISE_BEZIER_KNOTS','QUASI_UNIFORM_KNOTS','UNIFORM_KNOTS','UNSPECIFIED');
};
}
And I understand your point by reading the EXPRESS file seeing the LIST OF .. LIST statement preceding the IfcCartesianPoint.
ENTITY IfcBSplineSurface
ABSTRACT SUPERTYPE OF (ONEOF
(IfcBSplineSurfaceWithKnots))
SUBTYPE OF (IfcBoundedSurface);
UDegree : IfcInteger;
VDegree : IfcInteger;
ControlPointsList : LIST [2:?] OF LIST [2:?] OF IfcCartesianPoint;
SurfaceForm : IfcBSplineSurfaceForm;
UClosed : IfcLogical;
VClosed : IfcLogical;
SelfIntersect : IfcLogical;
DERIVE
UUpper : IfcInteger := SIZEOF(ControlPointsList) - 1;
VUpper : IfcInteger := SIZEOF(ControlPointsList[1]) - 1;
ControlPoints : ARRAY [0:UUpper] OF ARRAY [0:VUpper] OF IfcCartesianPoint := IfcMakeArrayOfArray(ControlPointsList,
0,UUpper,0,VUpper);
END_ENTITY;
It shouldn't pose much of a problem to generate an intermediate object containing multiple IfcCartesianPoint.
I will need to find how I can allow DirectionRatios (and other similar scenarios) to be a union of tuple<float64, float64, float64> or tuple<float64, float64>
One other quick way around this would be to use the maximum length and append `NaN
(not a number, a ieee-754 standardized case) in case the dimensionality for an instance in the file is less, express doesn't have nans so it's fully reversible. Same applies to CartesianPoint. Or maybe just rewrite it to X, Y, optional Z. These kind of options are discussed in this paper https://www.sciencedirect.com/science/article/abs/pii/S0926580517301826
Okay, with a little help from the EdgeDB folks the DirectionRatios
(and similar definitions) with arrays of varying size issue was easy to solve using a constraint expression
like so:
type IfcDirection extending IfcGeometricRepresentationItem {
required property DirectionRatios -> array<float64>{
constraint expression on (len(__subject__) = 2 or len(__subject__) = 3)
};
}
So now I've turned my focus onto ControlPointsList : LIST [2:?] OF LIST [2:?] OF IfcCartesianPoint;
. I'll let you know once I have that fixed :)
Oh :) I'd be interesting to see what other possibilities the constraint expressions have, maybe some of the where rules can be represented by it.
Here's the reply I got from the EdgeDB guys discord message.
He refers to the docs where It seems there is a lot of useful functions and operators -> https://www.edgedb.com/docs/stdlib/generic
where len
is
-> https://www.edgedb.com/docs/stdlib/generic#function::std::len
which combined with the constraint expression options provides some interesting opportunities. -> https://www.edgedb.com/docs/stdlib/constraints
Very cool. There are quite a bit of constrains we can map:
one_of ('Open', 'Closed', 'Merged');
~ WR1 : SELF IN ['top-left', 'top-middle', 'top-right', 'middle-left', 'center', 'middle-right', 'bottom-left', 'bottom-middle', 'bottom-right'];
constraint min_value_ex(0);
~ WR1 : SELF > 0.;
constraint exclusive;
~ LIST [1:?] OF UNIQUE IfcGridAxis;
Wonder what it would look like to encode that an IfcWall should have an IfcWallType and not an IfcWindowType
CorrectTypeAssigned : (SIZEOF(IsTypedBy) = 0) OR
('IFC4X1.IFCWALLTYPE' IN TYPEOF(SELF\IfcObject.IsTypedBy[1].RelatingType));
Okay,
second IFC file cube-advanced-brep.ifc
down :)
(left: original, right: roundtripped through EdgeDB)
To insert the following IfcBSplineSurfaceWithKnots
:
"#36=IfcBSplineSurfaceWithKnots(3,1,((#48,#49),(#50,#51),(#52,#53),(#54,#55)),.UNSPECIFIED.,.F.,.F.,.F.,(4,4),(2,2),(0.,1224.74487139159),(3.,4.),.UNSPECIFIED.)"
I ended up with the following automatically generated insert statement.
WITH
ifc_48 := (INSERT IfcCartesianPoint { Coordinates := [-0.5, 0.5, 0.0]}),
ifc_49 := (INSERT IfcCartesianPoint { Coordinates := [-0.5, -0.5, 0.0]}),
IfcCartesianPointList_1 := (INSERT IfcCartesianPointICList { IfcCartesianPoints := {ifc_48,ifc_49} }),
ifc_50 := (INSERT IfcCartesianPoint { Coordinates := [-0.561004233964073, 0.27232909936926, 0.333333333333333]}),
ifc_51 := (INSERT IfcCartesianPoint { Coordinates := [-0.27232909936926, -0.561004233964073, 0.333333333333333]}),
IfcCartesianPointList_2 := (INSERT IfcCartesianPointICList { IfcCartesianPoints := {ifc_50,ifc_51} }),
ifc_52 := (INSERT IfcCartesianPoint { Coordinates := [-0.622008467928146, 0.0446581987385206, 0.666666666666667]}),
ifc_53 := (INSERT IfcCartesianPoint { Coordinates := [-0.0446581987385206, -0.622008467928146, 0.666666666666667]}),
IfcCartesianPointList_3 := (INSERT IfcCartesianPointICList { IfcCartesianPoints := {ifc_52,ifc_53} }),
ifc_54 := (INSERT IfcCartesianPoint { Coordinates := [-0.683012701892219, -0.183012701892219, 1.0]}),
ifc_55 := (INSERT IfcCartesianPoint { Coordinates := [0.183012701892219, -0.683012701892219, 1.0]}),
IfcCartesianPointList_4 := (INSERT IfcCartesianPointICList { IfcCartesianPoints := {ifc_54,ifc_55} }),
SELECT (INSERT IfcBSplineSurfaceWithKnots {
UDegree := 3,
VDegree := 1,
ControlPointsList := {IfcCartesianPointList_1,IfcCartesianPointList_2,IfcCartesianPointList_3,IfcCartesianPointList_4},
SurfaceForm := 'UNSPECIFIED',
UClosed := False,
VClosed := False,
SelfIntersect := False,
UMultiplicities := (4, 4),
VMultiplicities := (2, 2),
UKnots := (0.0, 1224.74487139159),
VKnots := (3.0, 4.0),
KnotSpec := 'UNSPECIFIED'
}
)
And the class definitions are like this:
type IfcCartesianPointICList { required multi link IfcCartesianPoints -> IfcCartesianPoint }
abstract type IfcBSplineSurface extending IfcBoundedSurface {
required property UDegree -> int64;
required property VDegree -> int64;
required multi link ControlPointsList -> IfcCartesianPointICList;
required property SurfaceForm -> str {
constraint one_of ('CONICAL_SURF','CYLINDRICAL_SURF','GENERALISED_CONE','PLANE_SURF','QUADRIC_SURF','RULED_SURF','SPHERICAL_SURF','SURF_OF_LINEAR_EXTRUSION','SURF_OF_REVOLUTION','TOROIDAL_SURF','UNSPECIFIED');
};
required property UClosed -> bool;
required property VClosed -> bool;
required property SelfIntersect -> bool;
}
type IfcBSplineSurfaceWithKnots extending IfcBSplineSurface {
required property UMultiplicities -> tuple<int64, int64>;
required property VMultiplicities -> tuple<int64, int64>;
required property UKnots -> tuple<float64, float64>;
required property VKnots -> tuple<float64, float64>;
required property KnotSpec -> str {
constraint one_of ('PIECEWISE_BEZIER_KNOTS','QUASI_UNIFORM_KNOTS','UNIFORM_KNOTS','UNSPECIFIED');
};
}
Edit: I changed the new intermediary class name from IfcCartesianPointList
to IfcCartesianPointICList
to not conflict with the existing IfcCartesianPointList
class :)
Absolutely amazing!
Edit: I changed the new intermediary class name from IfcCartesianPointList to IfcCartesianPointICList to not conflict with the existing IfcCartesianPointList class :)
I suggested List_of_IfcCartesianPoint
because in my experience the Ifc- prefix is handy in distinguishing which definitions come from the schema and which don't. Granted, in your case there aren't that many extra definitions as is the case in the IfcOpenShell API. but I made this 'mistake' in IfcOpenShell and now gradually making changes so that only definitions directly from the IFC schema start with Ifc. ifcopenshell.file
in C++ is called IfcParse::IfcFile
and at some point this will be renamed to ifcopenshell::file
or sth.
Next step ifcopenshell.open('edgedb://localhost:5000')
? š
I suggested List_of_IfcCartesianPoint because in my experience the Ifc- prefix is handy in distinguishing which definitions come from the schema and which don't.
Seems reasonable. Done.
Next step ifcopenshell.open('edgedb://localhost:5000')? š
We'll get there for sure :)
I am almost done with setting up unittesting of the roundtripped IFC files. But I wondered if there are any diff'ing functions exposed in ifcopenshell? I.e. if I wanted to compare 2 ifcopenshell objects for similar content but disregarding the numeric id's associated with each element?
FYI so far I have the following pytest up and running in github actions:
import pathlib
import shutil
from ifcdb import EdgeIO
import pytest
@pytest.mark.parametrize("ifc_file_name", ["tessellated-item.ifc", "cube-advanced-brep.ifc"])
def test_roundtrip_ifc_files_element_wise(ifc_files_dir, em_ifc4x1, ifc_file_name):
db_name = ifc_file_name.replace('.ifc', '').replace('-', '_')
ifc_file = ifc_files_dir / ifc_file_name
db_schema_dir = pathlib.Path("temp").resolve().absolute() / db_name / "dbschema"
if db_schema_dir.exists():
shutil.rmtree(db_schema_dir)
with EdgeIO(ifc_file, em=em_ifc4x1, db_schema_dir=db_schema_dir, database=db_name) as io:
# Set up Schema & Database
io.create_schema_from_ifc_file()
io.setup_database(delete_existing_migrations=True)
# Insert IFC elements
io.insert_ifc()
# Query Data (raw output)
result = io.get_all(limit_to_ifc_entities=True)
# Query Data (raw output converted to ifcopenshell object)
new_ifc_object = io.to_ifcopenshell_object()
# Validate Data
# Alt 1: Compare original with new ifcopenshell objects
original_ifc_object = io.ifc_io.ifc_obj
# Alt 2: Compare using raw data, schema and original ifcopenshell object
I'm sure @Moult has something, but in it's simplest form what you rely on is
entity_instance.get_info(include_identifier=False, recursive=True)
This will get you a python dictionary of an entity instance. That you could compare to another instance. But to what instance? Why not create it for all instances in the file, but it needs to be unordered, so a set. That requires immutability. Dicts are not. So you could try something like this with two ifcopenshell files f1
, f2
:
fingerprint = lambda file: frozenset(inst.get_info(include_identifier=True, recursive=False, return_type=frozenset) for inst in file)
assert fingerprint(f1) == fingerprint(f2)
This is a simpler version of what we use in https://academy.ifcopenshell.org/posts/calculate-differences-of-ifc-files-with-hashing/
fingerprint = lambda file: frozenset(inst.get_info(include_identifier=True, recursive=False, return_type=frozenset) for inst in file) assert fingerprint(f1) == fingerprint(f2)
I like that idea! A lot more compact and elegant than whatever I had in the works :)
The code itself runs, but the assertion fails. I can't see entirely what's wrong (the failed assertion returns a huge string showing the contents of both sets). But my hunch is that the properties referring to other ifcopenshell entities might be failing?
Maybe I can run a quick cleanup of the dictionaries and remove references to other objects before attempting to do an equals assertion?
Hm, yes I kind of assumed they would be equal. How about this then:
print(
sorted(
fingerprint(f1).symmetric_difference(fingerprint(f2)),
key=lambda x: len(str(x))
)[0]
)
So the element from the symmetric difference [(f2 - f1) | (f1 - f2)] with the shortest representation string.
Could be something as simple/annoying as floating point rounding...
I can't believe I missed this issue this is amazing progress! I haven't read fully, on my phone now but have you checked the src/ifcdiff codebase in the ifcopenshell repo which uses (a slightly patched for early returns) deepdiff? Maybe it helps.
I tried something like this:
get_info_props = dict(include_identifier=True, recursive=False, return_type=frozenset)
fingerprint = lambda file: frozenset(inst.get_info(**get_info_props) for inst in file)
for result in sorted(fingerprint(f1).symmetric_difference(fingerprint(f2)), key=lambda x: len(str(x))):
print(result)
assert fingerprint(f1) == fingerprint(f2)
And just by looking at the first 4 elements in the printed block
frozenset({('DirectionRatios', (0.0, 1.0)), ('type', 'IfcDirection'), ('id', 9)})
frozenset({('id', 1), ('type', 'IfcDirection'), ('DirectionRatios', (0.0, 1.0))})
frozenset({('type', 'IfcCartesianPoint'), ('Coordinates', (0.5, 0.5)), ('id', 56)})
frozenset({('type', 'IfcCartesianPoint'), ('id', 54), ('Coordinates', (0.5, 0.5))})
Might it be just that we're trying to compare the numeric id
of the IFC entities here?
@Moult Thanks! I'll check it out :)
Sorry, my fault include_identifier=True should be False :(
Sent from a mobile device. Excuse my brevity. Kind regards, Thomas
Op vr 17 jun. 2022 15:31 schreef Kristoffer Andersen < @.***>:
@Moult https://github.com/Moult Thanks! I'll check it out :)
ā Reply to this email directly, view it on GitHub https://github.com/Krande/ifcdb/issues/1#issuecomment-1158875088, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAILWV2XZGQTB666AKV36QDVPR43XANCNFSM5WVEGKZQ . You are receiving this because you authored the thread.Message ID: @.***>
@aothms Okay, by setting the include_identifier=False
reduces the length of the symmetric difference from 354 to 180 remaining elements!
Now, by looking at the printed result I do get a sense that the remaining differences are due to properties referring to other objects (from what I can tell all remaining frozensets contains at least 1 reference to another IFC element)?
frozenset({('type', 'IfcPlane'), ('Position', #84=IfcAxis2Placement3D(#46,#10,#47))})
frozenset({('type', 'IfcPlane'), ('Position', #44=IfcAxis2Placement3D(#87,#88,#20))})
frozenset({('Position', #83=IfcAxis2Placement3D(#44,#45,#11)), ('type', 'IfcPlane')})
frozenset({('type', 'IfcPlane'), ('Position', #46=IfcAxis2Placement3D(#93,#19,#94))})
frozenset({('type', 'IfcVertexPoint'), ('VertexGeometry', #52=IfcCartesianPoint((0.5,0.5,0.)))})
frozenset({('type', 'IfcVertexPoint'), ('VertexGeometry', #147=IfcCartesianPoint((0.5,0.5,0.)))})
frozenset({('type', 'IfcVertexPoint'), ('VertexGeometry', #55=IfcCartesianPoint((0.5,-0.5,0.)))})
frozenset({('VertexGeometry', #49=IfcCartesianPoint((-0.5,0.5,0.))), ('type', 'IfcVertexPoint')})
@Moult : I just managed to test with the IfcDiff class. Currently it informs me that there are 2 objects that have been changed. I'll look more into how I can get more granular information about what exactly the differences are. The resulting json file is at least as follows:
{
"added": [],
"deleted": [],
"changed": {
"1hMBdOkWj7WhC2kvgZp44F": {
"has_geometry_change": true
},
"3qzoyCPy1CtfV237Rle9$t": {
"has_geometry_change": true
}
}
}
But that is enough for today. I'll continue working on this next week :)
Sorry, doing this on my phone... recursive=False should be True
Sent from a mobile device. Excuse my brevity. Kind regards, Thomas
Op vr 17 jun. 2022 15:55 schreef Kristoffer Andersen < @.***>:
@aothms https://github.com/aothms Okay, by setting the include_identifier=False narrows the length of the symmetric difference from 354 to 180 remaining elements!
Now, by looking at the printed result I do get a sense that the remaining differences are due to properties referring to other objects?
frozenset({('type', 'IfcPlane'), ('Position', #84=IfcAxis2Placement3D(#46,#10,#47))}) frozenset({('type', 'IfcPlane'), ('Position', #44=IfcAxis2Placement3D(#87,#88,#20))}) frozenset({('Position', #83=IfcAxis2Placement3D(#44,#45,#11)), ('type', 'IfcPlane')}) frozenset({('type', 'IfcPlane'), ('Position', #46=IfcAxis2Placement3D(#93,#19,#94))}) frozenset({('type', 'IfcVertexPoint'), ('VertexGeometry', #52=IfcCartesianPoint((0.5,0.5,0.)))}) frozenset({('type', 'IfcVertexPoint'), ('VertexGeometry', #147=IfcCartesianPoint((0.5,0.5,0.)))}) frozenset({('type', 'IfcVertexPoint'), ('VertexGeometry', #55=IfcCartesianPoint((0.5,-0.5,0.)))}) frozenset({('VertexGeometry', #49=IfcCartesianPoint((-0.5,0.5,0.))), ('type', 'IfcVertexPoint')})
@Moult https://github.com/Moult : I just managed to test with the IfcDiff class. Currently it informs me that there are 2 objects that have been changed. I'll look more into how I can get more granular information about what exactly the differences are. The resulting json file is at least as follows:
{ "added": [], "deleted": [], "changed": { "1hMBdOkWj7WhC2kvgZp44F": { "has_geometry_change": true }, "3qzoyCPy1CtfV237Rle9$t": { "has_geometry_change": true } } }
But that would enough for today. I'll continue working on this next week :)
ā Reply to this email directly, view it on GitHub https://github.com/Krande/ifcdb/issues/1#issuecomment-1158896594, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAILWVYSCTK5BTJGIPOYNE3VPR7UFANCNFSM5WVEGKZQ . You are receiving this because you were mentioned.Message ID: @.***>
Okay, recursive=False
did the trick :)
All elements and properties were 100% maintained in roundtripping of cube-advanced-brep.ifc
.
However in the roundtripping of tessellated-item.ifc
I found that the derived property CoordinateSpaceDimension
in class IfcGeometricRepresentationSubContext
was set to 0
in the original IFC file, and was None
in the roundtripped file.
So I guess that's an acceptable result of the first attempts of IFC file roundtripping?
FYI -> I added this function to debug and find which particular offending property and element causing the assertions to fail:
def compare_ifcopenshell_objects_element_by_element(f1: ifcopenshell.file, f2: ifcopenshell.file):
get_info_props = dict(include_identifier=True, recursive=False, return_type=frozenset)
fingerprint = lambda file: frozenset(inst.get_info(**get_info_props) for inst in file)
results = sorted(fingerprint(f1).symmetric_difference(fingerprint(f2)), key=lambda x: len(str(x)))
res = [set([name for name, value in result]) for result in results]
matches = []
i = 0
while i < len(res):
x = res[i]
for k, match_eval in enumerate(res):
if k == i or x != match_eval:
continue
found = tuple(sorted([i, k]))
if found not in matches:
matches.append(found)
break
i += 1
# Compare element by element
for a, b in matches:
m_a = {key: value for key, value in results[a]}
m_b = {key: value for key, value in results[b]}
ifc_class = m_a['type']
for key, value in m_a.items():
other_val = m_b[key]
if isinstance(value, frozenset):
continue
if isinstance(value, tuple) and isinstance(value[0], frozenset):
continue
if other_val != value:
logging.error(f'Diff in Ifc Class "{ifc_class}" property: {key} := "{value}" != "{other_val}"')
It's unfortunately rather undocumented. But here are some examples
If you use
ifcopenshell.express.parse()
you get the same data structure btw.It's a complete representation of all express data except for the where rules and functions.
Things to be on the lookout for are:
*
in .ifc files, for example in IfcSIUnit). It means that an attribute provided by a supertype is in a subtype provided by means of a formula (hence absent in the .ifc file).(PS for sorting subtypes after their supertypes you can use https://pypi.org/project/toposort/)