Closed WiIIiam278 closed 9 months ago
Hey, thanks for your contribution. :slightly_smiling_face:
Your PR is lacking tests in
de.exlll.configlib.FileConfigurationPropertiesTest
de.exlll.configlib.YamlWriterTest
Do you want to add them?
I've added tests and changed the encoding back to using the local environment's encoding.
Something I noticed while writing the tests: Your tests for file reading use Files.readString
. This method always defaults to using UTF-8 for file reading, and not the system locale. Accordingly, these tests may fail when run in environments not set to use UTF-8 encoding due to the mismatch between this method and the BufferedFileReader, which does default to Charset.defaultCharset()
. I've fixed this, and now supply Charset.defaultCharset()
in TestUtils
.
I'd like to suggest that in the next major release you change the default to using UTF-8 and making that breaking change - I think most folks are making a reasonable expectation that it would default to that, and it avoids this headache all around :)
Great, thanks alot!
Something I noticed while writing the tests...
I think the reason that they haven't failed yet is that I haven't written any test cases that have non-ASCII character in them. Thanks for catching and fixing that!
I'd like to suggest that in the next major release you change the default to using UTF-8
Absolutely - the first thing I checked was whether JEP 400: UTF-8 by Default was already available in Java 17. Sadly, it wasn't.
Your PR looks good to me. :+1: Are you working on anything else or would you like me to publish version 4.4.0
?
I think it's ready to go :) Thanks for the great library.
This PR adds the ability to specify a character set for reading/writing YAML, adding a new option to the Builder. It also changes the default to UTF_8, instead of the system locale.
Currently, no
Charset
is passed when creating aBufferedReader
for reading or anOutputStreamWriter
for writing, so these will default to reading/writing files inCharset.defaultCharset()
(based on the system / environment locale).This is fine, assuming the end user has their system / environment locale sensibly configured to encode in UTF-8, which of course they inevitably don't; at which point all hell breaks loose with comments and strings containing international glyphs being replaced with question marks when writing.