Provide identity to IFC files

pasi-paasiala commented 4 years ago

Description of the proposal:

IFC files don't have persistent identity. When a file is generated from a software, it is nearly impossible, for example, to know that the next IFC file of the same project is related to the previous version. This causes challenges, for example, in dealing with BCF. When a BCF Topic is sent to a CDE, it is impossible to reliably identify the files that the topic is dealing with. BCF tries to send the potential identifiers, like IfcProject GUID, filename, timestamp, etc, but none of these has been proven reliable.

The proposal is to add to the IFC headers the following entries:

DocumentID: a UUID that persists throughout the versions of the file that are generated from the same model
DocumentRevision: a version-specific UUID

Is this a proposal to 'add', 'remove' of 'change' entities in the schema (pick one):

What do we win: Reliable identification of IFC files

What do we loose

Schema impact: None

Instance model impact: None

Backwards compatible: Yes

Automatic migration possible: No

Additional implications: Implementing software should have a provision for the user to reset the Document ID, for example, when file is saved as.

bekraft commented 4 years ago

I thought that the IfcProject's GlobalId (inherited from IfcRoot) would do the trick to match two IFC models against a single project? Revision is more a kind of marking not identification. We've implemented a custom project property to carry a simple textual revision mark.

Having a project Id hint within the header the need to find an IfcProject instance and its properties would be unnecessary. Maybe a future IFC and its physical representation (any kind of resource) should provide standardized meta data?

pasi-paasiala commented 4 years ago

When we did the original version of BCF, it was also our assumption that IfcProject's GlobalId would do the trick. It doesn't seem to work for various reasons. Another advantage for having this information in the header is that the reader doesn't need to parse the whole file to get the information. Reading just the header would suffice.

berlotti commented 4 years ago

Could you elaborate on the 'for various reasons'? Maybe we can fix those...

We are moving more and more to partial IFC exchange where files are just snapshots at a given time with a given query. Keeping versioning with files will be impossible, so we might want to look at solving the core issues.

NickNisbet commented 4 years ago

Eitther such an identifier is machine generated, whereupon it will suffer from any failings that IfcProject.GlobalId (IfcContext.GlobalId)

Or such an identifier is human generated, whereupon it will suffer from the risk of duplication and default values just like IfcProject.Name (IfcContext.Name)

Putting it in the header doesn’t resolve the ‘various issues’

From: pasi-paasiala notifications@github.com Sent: 12 August 2020 06:32 To: buildingSMART/NextGen-IFC NextGen-IFC@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [buildingSMART/NextGen-IFC] Provide identity to IFC files (#67)

Description of the proposal:

IFC files don't have persistent identity. When a file is generated from a software, it is nearly impossible, for example, to know that the next IFC file of the same project is related to the previous version. This causes challenges, for example, in dealing with BCF. When a BCF Topic is sent to a CDE, it is impossible to reliably identify the files that the topic is dealing with. BCF tries to send the potential identifiers, like IfcProject GUID, filename, timestamp, etc, but none of these has been proven reliable.

The proposal is to add to the IFC headers the following entries:

DocumentID: a UUID that persists throughout the versions of the file that are generated from the same model
DocumentRevision: a version-specific UUID

Is this a proposal to 'add', 'remove' of 'change' entities in the schema (pick one):

What do we win: Reliable identification of IFC files

What do we loose

Schema impact: None

Instance model impact: None

Backwards compatible: Yes

Automatic migration possible: No

Additional implications: Implementing software should have a provision for the user to reset the Document ID, for example, when file is saved as.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/buildingSMART/NextGen-IFC/issues/67 , or unsubscribe https://github.com/notifications/unsubscribe-auth/ABYIIJINTUIPRE6MNEQIKALSAISLXANCNFSM4P4DPQ5A . https://github.com/notifications/beacon/ABYIIJLK6MWMFUB7KKDZMXDSAISLXA5CNFSM4P4DPQ5KYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4KDAJUPA.gif

jasollien commented 4 years ago

Hi all, I am a product developer for Bimsync (catenda) and have been a part of the BCF collaboration group for 5 years.

I agree with @ykulbak. From the BCF perspective, we really need an identity to IFC files.

One example with related models in BCF import: Because of IFC-files not having any identity, there is no way to expose in the BCF-file, which IFC-models are related. We have tried making a workaround using hash code of the files, but it wont work, for example if a new revision of the IFC-file is uploaded.

Currently, we are "guessing" the related models, based on products being selected, or visible. This is a very poor solution, partly because models may be visible in a BCF viewpoint, without having selected products.

Because of this, we do, quite often get requests from costumers, wondering why their imported BCF's are not correct in Bimsync.

If IFC files has an identity, we can pass this information together with the BCF, and know for sure which models are related.

ykulbak commented 4 years ago

Until recently Aconex was also "guessing" the related models, exactly how @jasollien describes it. The process is very brittle and hence misleading users (who don’t understand why the BCF Topic viewpoint is not loaded correctly).

A typical case where the IfcProject GUID is not unique is when IFC “part-models” are exported from a central model. These “part-models” typically follow some spatial division of the project to different "zones". All the “part-models” exported at a point-in-time will usually share the IfcProject GUID and all other IFC Header values which makes it nearly impossible to distinguish between them in the BCF workflow.

Another, less frequent case is when a reference/template file is used to start new models: we had cases where all the disciplines on a project had the same IfcProject GUID.

ykulbak commented 4 years ago

In a typical BCF use case a user would like to load a topic viewpoint and have confidence they are looking at the same models as the original author. To achieve this we require a way to identify a specific model file referenced in a topic without ambiguity The following contexts exist

Project: Context across documents. This context has no concrete definition currently in IFC (Is it a building? is it multiple buildings? is it a location?)
Document: A model file e.g. building-architectural.rvt. All revisions have the same Document identifier
Document-Revision: A revision of a model e.g. building-architectural-rev1.ifc, building-architectural-rev2.ifc

BCF file headers attempt to reference a Document-Revision in the form of an .ifc file:

File name
File timestamp
IfcProjectId (from inside the IFC file)

None of these are guaranteed to uniquely identify a model revision:

Filename: may capture the document context and/or revision context (without knowing which this field is ambiguous)
File timestamp: may capture when the IFC file was exported or may capture the timestamp of the revision (ambiguous)
IfcProjectId: has an ambiguous definition (see below) and without a concrete definition - this is not helpful apart from saying 'These documents have a relationship'

The schema documentation for IfcProject says:

"... The project establishes the context for information to be exchanged or shared, and it may represent a construction project but does not have to. "

This definition is so loose that it should be no surprise that there's such inconsistency across different vendors. Therefore currently detecting models is a heuristic guess between external systems and the user cannot have confidence they are actually viewing the same thing as the original author.

File-based model exchanges, which are still the absolute majority in the industry, require a robust way to identify a document (a set of revisions exported from the same model / zone) and a robust way to identify each revision of that document.

NickNisbet commented 4 years ago

Do these proposals make sense when the intention may be to reopen the authoring application, not necessarily the shared IFC?

All the mentioned attributes may be supportive of an identification, but I don't see why this it is felt necessary or possible to always have a definitive match.

The user may want to examine different models to see if the problem existed earlier or has been solved since.

Sent whilst away from my desk.

Regards,

Nick.

Nicholas Nisbet FRSA MA(Cantab) DipArch(UNL)
Fellow: Royal Society of Arts Fellow: buildingSMART International & UKI Chapter Director: AEC3 UK Ltd Web: http://www.aec3.com E-mail: nn@aec3.com Direct: +44 (0) 1494 714 933
Mobile: +44 (0) 781 616 8554
Skype: nicholasnisbet Registered Address: 46 St Margaret's Grove, Great Kingshill, High Wycombe, Bucks, HP15 6HP, UK

Vice-Chair: buildingSMART UK Chapter Convenor: buildingSMART Regulatory Room

** Confidentiality Notice **. This e-mail and any file(s) transmitted with it, is intended for the exclusive use by the person(s) mentioned above as recipient(s). This e-mail may contain confidential information and/or information protected by intellectual property rights or other rights. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this e-mail is strictly prohibited and may be unlawful. If you have received this e-mail in error, please notify the sender and delete the original and any copies of this e-mail and any printouts immediately from your system and destroy all copies of it.

On 24 Aug 2020, at 12:47, Yoram Kulbak notifications@github.com wrote:

In a typical BCF use case a user would like to load a topic viewpoint and have confidence they are looking at the same models as the original author. To achieve this we require a way to identify a specific model file referenced in a topic without ambiguity The following contexts exist

Project: Context across documents. This context has no concrete definition currently in IFC (Is it a building? is it multiple buildings? is it a location?) Document: A model file e.g. building-architectural.rvt. All revisions have the same Document identifier Document-Revision: A revision of a model e.g. building-architectural-rev1.ifc, building-architectural-rev2.ifc BCF file headers attempt to reference a Document-Revision in the form of an .ifc file File name File timestamp IfcProjectId (from inside the IFC file) None of these are guaranteed to uniquely identify a model revision Filename: may capture the document context and/or revision context (without knowing which this field is ambiguous) File timestamp: may capture when the IFC file was exported or may capture the timestamp of the revision (ambiguous) IfcProjectId: without a concrete definition - this is not helpful apart from saying 'These documents have a relationship' (ambiguous) Therefore currently detecting models is a heuristic guess between external systems and the user cannot have confidence they are actually viewing the same thing as the original author — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

GeorgDangl commented 4 years ago

@NickNisbet, that's a good point. As far as I understand it, this proposal would allow you to track IFC exports in the authoring tool, e.g. your CAD application could have internal logic to remember at what state the model was exported to an IFC file with document revision f51b42a8-1d71-4cd3-894b-f2da067f9456.

BCF workflows usually require you to load the exact same model (same DocumentID / DocumentRevision combination) and may often have use cases where you want to switch to another model with the same DocumentID but maybe at a later stage, e.g. after problems have been fixed.

Right now, IfcProject really is ambiguous - if I'm splitting my architectural model into 4 IFC files, should all have the same IfcProject Id? And if they do, how can I uniquely identify them? If I want to link to an issue in file Architectural_West.ifc, how can I correctly distinguish it from Architectural_East.ifc?

devonsparks commented 4 years ago

@NickNisbet I'm curious: why wouldn't using the Header section to store a persistent ID for the exchange file solve the problem? As @pasi-paasiala notes, it at least has the benefit that you don't need to parse the whole file.

Keeping a persistent ID in the form of a URI within the external_file_identifications entity of the Header, for example, seems like a decent fit. Per P21e3, external_file_identifications is a "list of external addresses for the files whose schema populations are to be included in the schema population of this file." Each list element is a 3-tuples, where the first entry of each tuple shall "be the address of the referenced exchange structure represented as a Universal Resource Identifier". The second and third elements of the tuple hold time stamps and hashes respectively, which are likely to be helpful for systems supporting version control.

More generally, BuildingSmart's adoption of P21e3 could do a lot to help challenges around persistent identity and GlobalIds. P21e3's new Anchor and Reference sections allow import and export of instances by tagging them with URIs. Systems consuming these IFC fragments can then resolve entities by their Anchor'd URIs and gain entry into the resolved instance (sub)graph. Once we can anchor any instance in an exchange file to a URI, it's unclear to me whether "GlobalIds" are needed at all. Anchors simultaneously provide a standards-based identity mechanism while supporting instance encapsulation (i.e., not every instance in an exchange file needs to have a GlobalId). Combine that with a linked data resolution mechanism ala GS-1's Digital Link and some pretty cool workflows open up.

Thoughts?

NickNisbet commented 4 years ago

@DevonSparks

A header identifier would be liable to the same use and abuse as the IfcProject GUID. If used properly it is fine, if it is not used properly it is problematic.

From: Devon Sparks notifications@github.com Sent: 11 September 2020 02:08 To: buildingSMART/NextGen-IFC NextGen-IFC@noreply.github.com Cc: NickNisbet nn@aec3.com; Mention mention@noreply.github.com Subject: Re: [buildingSMART/NextGen-IFC] Provide identity to IFC files (#67)

@NickNisbet https://github.com/NickNisbet I'm curious: why wouldn't using the Header section to store a persistent ID for the exchange file solve the problem? As @pasi-paasiala https://github.com/pasi-paasiala notes, it at least has the benefit that you don't need to parse the whole file to check for equality.

Keeping a persistent ID in the form of a URI within the external_file_identifications entity of the Header, for example, seems like a decent fit. Per P21e3 http://www.steptools.com/stds/step/IS_final_p21e3.html , external_file_identifications is a list of 3-tuples, where the first entry of each tuple shall "be the address of the referenced exchange structure represented as a Universal Resource Identifier". The second and third elements of the tuple hold time stamps and hashes respectively, which are likely to be helpful for systems supporting version control.

More generally, BuildingSmart's adoption https://github.com/buildingSMART/NextGen-IFC/issues/38 of P21e3 could do a lot to help challenges around persistent identity and GlobalIds. P21e3's new Anchor and Reference sections allow import and export of instances by tagging them with URIs. Systems consuming these IFC fragments can then resolve entities by their Anchor'd URIs and gain entry into the resolved instance (sub)graph. Once we can anchor any instance in an exchange file to a URI, it's unclear to me whether "GlobalIds" are needed at all. Anchors simultaneously provide a standards-based identity mechanism while supporting instance encapsulation (i.e., not every instance in an exchange file needs to have a GlobalId). Combine that with a linked data resolution mechanism ala GS-1's Digital Link https://github.com/gs1/DigitalLinkDocs and some pretty cool workflows open up.

Thoughts?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/buildingSMART/NextGen-IFC/issues/67#issuecomment-690812492 , or unsubscribe https://github.com/notifications/unsubscribe-auth/ABYIIJLZLYYGSFSDTJQ3KOTSFF2ADANCNFSM4P4DPQ5A . https://github.com/notifications/beacon/ABYIIJOE6QK3LS2VW7LGC7TSFF2ADA5CNFSM4P4DPQ5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOFEWPMTA.gif

CBenghi commented 4 years ago

@NickNisbet,

A header identifier would be liable to the same use and abuse as the IfcProject GUID. If used properly it is fine, if it is not used properly it is problematic.

True, but CDEs could enforce/support some level of coherence and validation on this.

devonsparks commented 4 years ago

Thanks both. The challenges with using the IfcProject GUID for file identity make sense (e.g., @ykulbak's mention of parts-models and templates). I can't reason why Header attributes would suffer the same problems, though, because:

Every P21 Header is defined to "contain information that is applicable to the entire exchange structure", making it a natural place to store identity information of the file, independent of the enclosed instances. Multiple files might have the same GlobalId on each of their IfcProject instances like @ykulbak's use case, but a Header attribute can still uniquely identify each part file as a distinct resource.
"Proper use" seems like a definitional issue: the proper use of a Header attribute is whatever we define it to be in the spec. Nothing technically stops me from writing a poem on the outside of an envelope instead of a mailing address, but I can no longer have any expectations about the delivery of my mail, because I've violated the contract the Post Office set up for "proper use" of the mail system. So it is too with P21 Header attributes.

Maybe an example of the challenges of using Header attributes for file identifiers would help clarify?

In summary, I didn't see any conceptual issue with @pasi-paasiala's !DocumentID Header attribute proposal. I only wanted to call out possibly relevant features within the P21e3 spec itself for this purpose, so that we might do more with less :)

Thanks!

o314 commented 4 years ago

A close but not identitical issue

make id a persistent id (this whole page)
allow short id

GUID/UUID are very very very long and among the worst user experience one can provide to share link between people.
They have almost completely disappeared from any url / web address since two decades. There should be a reason.

Dell builds its business by using service tag that are only 7 alphanum long (in base 32).

Sheetset are quite often numbered with 3 digits eg. up to 999 What about considering that an object can be tracked contextually by a 4 digits id (PS) ?

Here is a table summarizing the expressiveness of an id from its vocabulary and length :

id len	base	world size	world size (sci repr)
3	32	32768	32e3
7	32	34359738368	34e9
128	2	-too long-	340e36

GUID is the last one. It is able to adress more than 1e36, a sextillion, number of object. A range of some thousand or a billions should be enough for every project

PS a valid data context should be bound to an xref rather than a sheet, numbers were approximatively transfered to illustrate the point.

buildingSMART / NextGen-IFC

Provide identity to IFC files #67