bldrs-ai / ifctool

Command line tools for working with IFC models
https://bldrs-ai.github.io/ifctool
16 stars 5 forks source link

Umlauts are not properly encoded #7

Closed Adrian62D closed 2 years ago

Adrian62D commented 2 years ago

Steps to reproduce:

Download the sample ifc.

Run: node src/main.js SEESTRASSE.ifc > seestrasse.json

Expected result:

Umlauts are encoded in UTF-8.

Actual result:

web-ifc: 0.0.34 threading: 0
{
  "type": "ifcJSON",
  "version": "0.0.1",
  "originatingSystem": "IFC2JSON_js 3.0.2",
  "preprocessorVersion": "web-ifc 0.0.34",
  "time": "2022-07-05T07:00:22.798Z",
  "data": [
    {
      "expressID": 86,
      "type": "IFCPROJECT",
      "children": [
        {
          "expressID": 109,
          "type": "IFCSITE",
          "GlobalId": {
            "type": 1,
            "value": "2JVRWWE0fQTJBOMjht6CZc"
          },
          "OwnerHistory": {
            "type": 5,
            "value": 28
          },
          "Name": {
            "type": 1,
            "value": "Gel\\X2\\00E4\\X0\\nde" <----------- should be encoded as "Gelände" in UTF-8
          },
pablo-mayrgundter commented 2 years ago

This is actually intended behavior when the --deref flag isn't present, but I could go either way but I lean towards preserving it. --deref however is kind of catch all at this point.. it turns object references into values, but also does this string decoding.

The IFC spec does this encoding on purpose, and I see some value in preserving this when the output is trying to relay the most literal representation.

pablo-mayrgundter commented 2 years ago

I think this has been addressed. Note, the deref'd version has correct decoded of extended characters incl umlauts. Let me know if not/