buildingSMART / NextGen-IFC

62 stars 4 forks source link

UFT8 encoding for IFC serialisations #7

Open berlotti opened 4 years ago

berlotti commented 4 years ago

Currently STEP serialized IFC requires string encoding according to ISO 8859-1. (more info on https://technical.buildingsmart.org/resources/ifcimplementationguidance/string-encoding/) The latest STEP ISO standard has the ability to use UTF8 for encoding, which is widely adopted and the defacto standard.

I suggest to use UTF8 encoding for all serializations of IFC.

janbrouwer commented 4 years ago

yes please!

pipauwel commented 4 years ago

That... seems like common sense? What are the effects of the change? Are there any?

janbrouwer commented 4 years ago

Effects that I can think of:

On Wed, Mar 4, 2020, 23:30 Pieter Pauwels notifications@github.com wrote:

That... seems like common sense? What are the effects of the change? Are there any?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/buildingSMART/NextGen-IFC/issues/7?email_source=notifications&email_token=ABCZVLFUNLZZQJI27LQ6PXLRF3JAVA5CNFSM4K3EPVTKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEN2WTZQ#issuecomment-594897382, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABCZVLGBVEOPTKU2U55IHXTRF3JAVANCNFSM4K3EPVTA .

TLiebich commented 4 years ago

There needs to be an investigation about the increase of file size by using UTF-8, compared with ISO 8859-1 before making a decision. Usually (in typical IFC2x3 CV or IFC4 RV file) 98% of the text is coming from the ISO 8859-1 code tables (e.g. all geometry).

And file size does matter! Today practitioners are stuck with IFC files >500MB (e.g. for MEP models) and partial/transactional exchange cannot solve all exchange scenarios.

another observation - I would assume, that complete file-based exchange will best be served by sticking to STEP physical file, whereas other transactions are better served by using ifcXML, ifcJson, etc. There (in partial transactions) file sizes are not a problem. And in XML / Json UTF-8 is already supported.

berlotti commented 4 years ago

When adopting 2016 version of STEP this is according to the standard. Additional restrictions when using IFC: ONLY use UTF8 (exclude older ones)