GeertBellekens / Enterprise-Architect-Toolpack

Addins and tools for Sparx Systems Enterprise Architect
https://bellekens.com/product/bellekens-enterprise-architect-toolpack/
BSD 2-Clause "Simplified" License
97 stars 39 forks source link

EA-Matic save doesn't handle non-ascii characters properly #128

Closed baerrach closed 1 year ago

baerrach commented 1 year ago

After exporting the scripts via EA-Matic the files in the VBScript repository have comparison differences:

Framework/Utils/XML.vbs -' test = sanitizeXMLString("invali""d'str�i�ng<&>")

+' test = sanitizeXMLString("invali""d'str�i�ng<&>")

After using git to reset the file, and then LoadScripts.vbs to re-import the file it is the loading process that is breaking character handling. Probably in Utils/TextFile.loadContents()

baerrach commented 1 year ago

https://learn.microsoft.com/en-us/office/vba/language/reference/user-interface-help/openastextstream-method

Format Optional. One of three Tristate values used to indicate the format of the opened file. If omitted, the file is opened as ASCII.

Constant Value Description
TristateUseDefault -2 Opens the file by using the system default.
TristateTrue -1 Opens the file as Unicode.
TristateFalse 0 Opens the file as ASCII.

TextFile.vbs:136 set ts = fsoFile.OpenAsTextStream(ForReading, TristateUseDefault)

Setting this option to TristateTrue causes the file to be a binary mess. Setting this option to TristateFalse is the same as TristateUseDefault resulting in garbage non-ascii characters.

This is getting beyond my limited vb-fu. I suspect that VB doesn't support these characters properly

https://www.vbforums.com/showthread.php?559278-RESOLVED-Encoding-in-FileSystemObject&s=6526e81924f841235ac832deb778f309&p=3456131&viewfull=1#post3456131

The FSO is limited to "ANSI" (which in Windows-speak really means "UTF-16LE converted to 8-bit DOS encoding using the current codepage") and UTF-16LE.

If your files are not gigantic (< ~20MB) you can probably do the conversion (or create them in the first place) using the ADODB.Stream object. Things get only slightly tricky for Unicode encodings when you want to omit the BOM, but that is shown in the demo as well.

Generally you'd be better off just creating the text files using ADODB.Stream in the first place instead of converting afterward. The FSO is very limited, and really meant only for Windows Scripting.