Closed aothms closed 2 years ago
Strongly in favor of this. What about UTF8?
utf-8 is just the encoding, the real complexity comes from comparing unicode strings http://unicode.org/faq/normalization.html
I've got this test case right now:
Is that sufficient or would more need to be added?
We'd be testing more the capabilities of the parser than the IDS, but we can add some example string patterns because proper SPF parsing is a prerequisite of IDS handling and I guess it might be illustrative to have them in there. These ones I took from the iso doc (hope they don't mind):
'CAT'
CAT
'Don''t'
Don't
''''
'
''
(string of length zero)
'\S\Drger'
Ärger
'h\S\ttel'
hôtel
'\PE\\S\*\S\U\S\b'
Њет
Might be good to have the apostrophe in there and a different code page with the \PE. That covers IFC-SPF.
Do we need to test something for an IFC-XML file with some encoding?
Do we need to have an IDS-XML file with a different encoding than UTF-8 (the xml default, in the linked test case there is no xml content declaration present).
Edit:
and a different code page with the \PE
Oh oops, there's actually no way to write that with ifopsh
As you're probably aware, IFC/STEP/SPF has a specific string encoding mechanism for non-ascii code points: https://technical.buildingsmart.org/resources/ifcimplementationguidance/string-encoding/
Just to clarify here: I assume an xs:pattern / simpleValue is matched after the IFC string encoding (and IDS value) is normalized to unicode or something similar?