dbohdan / initool

Manipulate INI files from the command line
MIT License
63 stars 7 forks source link

[Feature request] handling of UTF-16 and UTF-32 encoding #16

Open ohault opened 6 months ago

ohault commented 6 months ago

In follow-up of https://github.com/dbohdan/initool/issues/15, a level of support of UTF-16 LE encoding will be desired.

A very first level, would be to recognize a file encoded in either UTF-16 or UTF-32 and to handle a specific error to instruct/advice the user to convert input file into a encoding supported by initool.

In inicomp tool, an error message is displayed with an instruction for the user. "Error: File .................. is encoded as UTF16. Please convert it to ANSI or UTF8 first."

dbohdan commented 6 months ago

Good idea to detect the BOM. I have implemented it and released v0.16.0.

The programming language Standard ML and the compiler I use currently have very limited support for Unicode. There is no library I know for working with encodings like UTF-16 and UTF-32 or detecting them using heuristics. This means opening a UTF-16 or UTF-32 file without the BOM will keep generating an "invalid line" error in the foreseeable future. There is also no way to pass Unicode command-line arguments to initool on Windows.

I will keep this issue open in case the Unicode situation changes.

ohault commented 6 months ago

I have just tested version 0.16.0 with a HCU-Test.reg using the encoding UTF-16 LE.

Here are the results:

C:>initool version 0.16.0

C:>initool -p get HCU-Test.reg HKEY_CURRENT_USER\Test\subkey1

C:>echo %errorlevel% 1

C:>initool -p get HCU-Test.reg HKEY_CURRENT_USER\Test\subkey1 """test_string"""

C:>echo %errorlevel% 1

C:>initool -p get HCU-Test.reg ■W i n d o w s R e g i s t r y E d i t o r V e r s i o n 5 . 0 0

[ H K E Y C U R R E N T U S E R \ T e s t ] " b i n " = h e x : 0 0 , 0 1 , 0 0 , 1 0 , 1 0 , 1 0 , 1 0

[ H K E Y C U R R E N T U S E R \ T e s t \ s u b k e y 1 ] " D W O R D 6 4 " = h e x ( b ) : 2 3 , f e , 5 3 , 0 0 , 0 0 , 0 0 , 0 0 , 0 0 " t e s t _ s t r i n g " = " b l a b l a b l a "

[ H K E Y C U R R E N T U S E R \ T e s t \ s u b k e y 2 ] " M u l t i " = h e x ( 7 ) : 4 6 , 0 0 , 6 f , 0 0 , 6 f , 0 0 , 0 0 , 0 0 , 4 2 , 0 0 , 6 1 , 0 0 , 7 2 , 0 0 , 0 0 , 0 0 , 0 0 , 0 0

For the last command, I guess it should also return an errorlevel 1

dbohdan commented 6 months ago

Could you attach HCU-Test.reg to a comment? You may need to change the extension to .txt or put it in a ZIP archive.

ohault commented 6 months ago

HCU-Test.reg.txt.zip Please find attached the requested file.

dbohdan commented 6 months ago

Thanks.

dbohdan commented 6 months ago

This was a bug in BOM detection. Thanks for reporting it. I have fixed the bug and released version 0.17.0, which includes other improvements.

ohault commented 6 months ago

Thank you @dbohdan

C:>initool.exe version 0.17.0

C:>initool -p get HCU-Test.reg Error: unsupported encoding: UTF-16 LE

C:>echo %errorlevel% 1

C:>initool -p get HCU-Test.reg HKEY_CURRENT_USER\Test\subkey1 Error: unsupported encoding: UTF-16 LE

C:>echo %errorlevel% 1

C:>initool -p get HCU-Test.reg HKEY_CURRENT_USER\Test\subkey1 """test_string""" Error: unsupported encoding: UTF-16 LE

C:>echo %errorlevel% 1