buildingSMART / IDS

Computer interpretable (XML) standard to define Information Delivery Specifications for BIM (mainly used for IFC)
https://www.buildingsmart.org/standards/bsi-standards/information-delivery-specification-ids/
Other
213 stars 65 forks source link

string value matches and express character encoding #64

Closed aothms closed 2 years ago

aothms commented 2 years ago

As you're probably aware, IFC/STEP/SPF has a specific string encoding mechanism for non-ascii code points: https://technical.buildingsmart.org/resources/ifcimplementationguidance/string-encoding/

Just to clarify here: I assume an xs:pattern / simpleValue is matched after the IFC string encoding (and IDS value) is normalized to unicode or something similar?

berlotti commented 2 years ago

Strongly in favor of this. What about UTF8?

aothms commented 2 years ago

utf-8 is just the encoding, the real complexity comes from comparing unicode strings http://unicode.org/faq/normalization.html

Moult commented 2 years ago

I've got this test case right now:

Is that sufficient or would more need to be added?

aothms commented 2 years ago

We'd be testing more the capabilities of the parser than the IDS, but we can add some example string patterns because proper SPF parsing is a prerequisite of IDS handling and I guess it might be illustrative to have them in there. These ones I took from the iso doc (hope they don't mind):

'CAT' CAT 'Don''t' Don't '''' ' '' (string of length zero) '\S\Drger' Ärger 'h\S\ttel' hôtel '\PE\\S\*\S\U\S\b' Њет

Might be good to have the apostrophe in there and a different code page with the \PE. That covers IFC-SPF.

Do we need to test something for an IFC-XML file with some encoding?

Do we need to have an IDS-XML file with a different encoding than UTF-8 (the xml default, in the linked test case there is no xml content declaration present).

Edit:

and a different code page with the \PE

Oh oops, there's actually no way to write that with ifopsh