Open macumber opened 4 years ago
@pflaumingo @theo-armour @tijcolem
@macumber The pugixml XML library that is in develop3 can handle UTF-16 and UTF-8, so there's always that. Or is develop3 dead?
Thanks Jason, the first item is just to verify that we have unit tests for both UTF 8 and 16. The second issue is how we are passing the xml string to the embedded web viewer.
I note that OS > File menu > import > gbXML imports UTF 16 files as desired. I assume that the data is converted to UTF 8 somewhere. If so, can the same mechanism be used with the Viewer?
At least a couple of the libraries out there can read UTF-8/16/x, but will default to output in UTF-8. I googled around a bit and it looks like QtXML does that too, so unless something else is specified, on output you get UTF-8. The original encoding isn't carried over even it all you've done is load the file and write it right it back out. I didn't dig in too deep, though, I might have missed something.
Here is where I think OpenStudio loads the gbXML file:
If m_gbXML
can be coerced to always be UTF8 this could be good.
@kbenne suggests to 1. learn how to invoke the gbxml translator programmatically, 2. can we find a file that consistently gives the error when in utf-16, 3. write a unit test based on number 1.
@joseph-robertson
Regarding #2. I believe that many gbXML files exported by Autodesk Revit are in 16-bit format.
Here is a link to a sample:
Regarding #1. I have solved the issues in recent version of the Spider gbXML Viewer ZIP file extractor. See these lines of code
I will be happy work with you to update the code in the current OS gbXML viewer but in order to help you I will need hand holding as to what the requirements might be for external apps in Open Studio V3.
See more information at https://github.com/NREL/OpenStudio/pull/3673
Need to ensure that OpenStudio correctly handles reading gbXML files in UTF-16 and UTF-8, need unit tests for each to read gbXML and convert to OSM. Need to verify that encoding matches actual xml contents using text editor. Also need to see if there are different byte orders (big endian/little endian) are used in UTF-16 in the wild.
If unit tests show the core correctly handles UTF-16 files then the remaining issue is in passing UTF-16 gbXML files to the embedded gbXML editor.