NREL / OpenStudio

OpenStudio is a cross-platform collection of software tools to support whole building energy modeling using EnergyPlus and advanced daylight analysis using Radiance.
https://www.openstudio.net/
Other
494 stars 188 forks source link

UTF-16 gbXML files not handled correctly in some cases #3684

Open macumber opened 4 years ago

macumber commented 4 years ago

See more information at https://github.com/NREL/OpenStudio/pull/3673

Need to ensure that OpenStudio correctly handles reading gbXML files in UTF-16 and UTF-8, need unit tests for each to read gbXML and convert to OSM. Need to verify that encoding matches actual xml contents using text editor. Also need to see if there are different byte orders (big endian/little endian) are used in UTF-16 in the wild.

If unit tests show the core correctly handles UTF-16 files then the remaining issue is in passing UTF-16 gbXML files to the embedded gbXML editor.

macumber commented 4 years ago

@pflaumingo @theo-armour @tijcolem

jasondegraw commented 4 years ago

@macumber The pugixml XML library that is in develop3 can handle UTF-16 and UTF-8, so there's always that. Or is develop3 dead?

macumber commented 4 years ago

Thanks Jason, the first item is just to verify that we have unit tests for both UTF 8 and 16. The second issue is how we are passing the xml string to the embedded web viewer.

theo-armour commented 4 years ago

I note that OS > File menu > import > gbXML imports UTF 16 files as desired. I assume that the data is converted to UTF 8 somewhere. If so, can the same mechanism be used with the Viewer?

jasondegraw commented 4 years ago

At least a couple of the libraries out there can read UTF-8/16/x, but will default to output in UTF-8. I googled around a bit and it looks like QtXML does that too, so unless something else is specified, on output you get UTF-8. The original encoding isn't carried over even it all you've done is load the file and write it right it back out. I didn't dig in too deep, though, I might have missed something.

theo-armour commented 4 years ago

Here is where I think OpenStudio loads the gbXML file:

https://github.com/NREL/OpenStudio/blob/21285b9a20fa3ce57b7dfecc28f3c20291c9d7bc/openstudiocore/src/openstudio_lib/GeometryEditorView.cpp#L593-L599

If m_gbXML can be coerced to always be UTF8 this could be good.

joseph-robertson commented 4 years ago

@kbenne suggests to 1. learn how to invoke the gbxml translator programmatically, 2. can we find a file that consistently gives the error when in utf-16, 3. write a unit test based on number 1.

theo-armour commented 4 years ago

@joseph-robertson

Regarding #2. I believe that many gbXML files exported by Autodesk Revit are in 16-bit format.

Here is a link to a sample:

https://github.com/ladybug-tools/3d-models/blob/master/gbxml-sample-files/first-batch/bristol-clifton-down-road-utf16.xml

Regarding #1. I have solved the issues in recent version of the Spider gbXML Viewer ZIP file extractor. See these lines of code

https://github.com/ladybug-tools/spider-2020/blob/master/lib/fo-file-open/foz-file-open-zip-2020-07-12.js#L41-L46

I will be happy work with you to update the code in the current OS gbXML viewer but in order to help you I will need hand holding as to what the requirements might be for external apps in Open Studio V3.