ElektraInitiative / libelektra

Elektra serves as a universal and secure framework to access configuration settings in a global, hierarchical key database.
https://www.libelektra.org
BSD 3-Clause "New" or "Revised" License
208 stars 123 forks source link

INI with BOM loses first section #1942

Closed markus2330 closed 4 years ago

markus2330 commented 6 years ago

Steps to Reproduce the Problem

kdb mount test.ini user/test ini
echo "\xEF\xBB\xBF[sec]\x0Aa=b" > `kdb file user/test`

Expected Result

That BOM is interpreted like a whitespace (simply ignored) and would not confuse the parser.

kdb ls user/test
#> user/test/sec/a

Actual Result

It seems like BOM+[ confuses the parser so that the section is not recognized.

kdb ls user/test
#> user/test/a
#> user/test/[sec]

System Information

alexsaber commented 5 years ago

Hi! Can you please assign me to this issue? I have picked 3 more ini related problems.

markus2330 commented 5 years ago

Actually this is a generic problem where maybe several parsers will choke. E.g. I quickly tried yajl, which has a similar problem:

kdb mount test.yajl user/test yajl
echo "{\"a\": \"b\"}" > `kdb file user/test` # works
echo "\xEF\xBB\xBF{\"a\": \"b\"}" > `kdb file user/test` # fails

To solve this generically, we could implement a remove/bom in the plugin filecheck (simply stripping the BOM from the file) or simply reject files which have BOM (use the already existing reject/bom feature).

@sanssecours What do you think? Do your storage plugins support BOM?

sanssecours commented 5 years ago

What do you think?

Rejecting a file with a BOM sounds like a good idea. Stripping the BOM (and maybe re-adding it later) sounds good in theory, but would potentially require communication between the BOM and storage plugin in the set direction.

I think the best solution would be to add the iconv plugin to the backend and convert the data in the get direction into UTF-8 (without BOM) and then back to whatever encoding the file used before. If there was BOM, then iconv should just add it again in the set direction. This way storage plugins could always just assume UTF-8 encoding. For all files that use the proper format (UTF-8 without BOM) the overhead would be minimal, since iconv does not need to change the file content at all.

Do your storage plugins support BOM?

YAML CPP does support BOMs:

kdb mount config.yaml user/tests/yaml yaml
printf "\xEF\xBB\xBFkey: value" > `kdb file user/tests/yaml`
kdb ls user/tests/yaml
#> user/tests/yaml/key

. All other plugins that I wrote do not, as far as I can tell.

markus2330 commented 5 years ago

Yes, I fully agree! To fix this issue here, it is enough if all storage plugins require "filecheck" and we reject all these different malformed files.

Later (if someone really needs config files with BOM or UTF-16) we can introduce a new fileiconv plugin.

@sanssecours Should we only reject BOM or also files with inconsistent line endings or any other setting filecheck has?

@alexsaber To solve the issue, you would add filecheck in infos/needs (in basically all storage plugins) and add /reject/bom (and maybe also check/lineending, depending on @sanssecours) to be default if no config is given for filecheck.

Note: Currently the filecheck plugin always goes over the full content, even if only /reject/bom is active. If we decide that /reject/bom is default, we should optimize this case. This should be trivial.

alexsaber commented 5 years ago

Thank you for the hints and comments!

Am Do., 21. März 2019 um 18:19 Uhr schrieb markus2330 < notifications@github.com>:

Yes, I fully agree! To fix this issue here, it is enough if all storage plugins require "filecheck" and we reject all these different malformed files.

Later (if someone really needs config files with BOM or UTF-16) we can introduce a new fileiconv plugin.

@sanssecours https://github.com/sanssecours Should we only reject BOM or also files with inconsistent line endings or any other setting filecheck has?

@alexsaber https://github.com/alexsaber To solve the issue, you would add filecheck in infos/needs (in basically all storage plugins) and add /reject/bom (and maybe also check/lineending, depending on @sanssecours https://github.com/sanssecours) to be default if no config is given for filecheck.

Note: Currently the filecheck plugin always goes over the full content, even if only /reject/bom is active. If we decide that /reject/bom is default, we should optimize this case. This should be trivial.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ElektraInitiative/libelektra/issues/1942#issuecomment-475324338, or mute the thread https://github.com/notifications/unsubscribe-auth/APkYvvuRg0CC0SQKDZFF1Xis5UhDZ10qks5vY789gaJpZM4TlW_A .

sanssecours commented 5 years ago

Should we only reject BOM or also files with inconsistent line endings or any other setting filecheck has?

Most storage plugins assume Unix line endings (\n). I think it would be great, if we reject all files that use Windows (\n\r) or Classic Mac OS (\r) line endings for those plugins. Rejecting files which contain null bytes or unprintable characters does also make sense for a lot of storage plugins.

markus2330 commented 5 years ago

Yes, most likely most of them will not support anything else other than \n. For csvstorage, however, we now add support for DOS line endings, see #2508. For the others I am fine with rejecting BOM and non-UNIX line endings.

markus2330 commented 4 years ago

INI is not important anymore due to TOML