Add builder option to specify charset for file encoding

Exlll / ConfigLib

A Minecraft library for saving, loading, updating, and commenting YAML configuration files

MIT License

135 stars 17 forks source link

Add builder option to specify charset for file encoding #28

Closed WiIIiam278 closed 9 months ago

WiIIiam278 commented 9 months ago

This PR adds the ability to specify a character set for reading/writing YAML, adding a new option to the Builder. It also changes the default to UTF_8, instead of the system locale.

Currently, no Charset is passed when creating a BufferedReader for reading or an OutputStreamWriter for writing, so these will default to reading/writing files in Charset.defaultCharset() (based on the system / environment locale).

This is fine, assuming the end user has their system / environment locale sensibly configured to encode in UTF-8, which of course they inevitably don't; at which point all hell breaks loose with comments and strings containing international glyphs being replaced with question marks when writing.

Exlll commented 9 months ago

Hey, thanks for your contribution. :slightly_smiling_face:

Your PR is lacking tests in

de.exlll.configlib.FileConfigurationPropertiesTest
de.exlll.configlib.YamlWriterTest

Do you want to add them?

WiIIiam278 commented 9 months ago

I've added tests and changed the encoding back to using the local environment's encoding.

Something I noticed while writing the tests: Your tests for file reading use Files.readString. This method always defaults to using UTF-8 for file reading, and not the system locale. Accordingly, these tests may fail when run in environments not set to use UTF-8 encoding due to the mismatch between this method and the BufferedFileReader, which does default to Charset.defaultCharset(). I've fixed this, and now supply Charset.defaultCharset() in TestUtils.

I'd like to suggest that in the next major release you change the default to using UTF-8 and making that breaking change - I think most folks are making a reasonable expectation that it would default to that, and it avoids this headache all around :)

Exlll commented 9 months ago

Great, thanks alot!

Something I noticed while writing the tests...

I think the reason that they haven't failed yet is that I haven't written any test cases that have non-ASCII character in them. Thanks for catching and fixing that!

I'd like to suggest that in the next major release you change the default to using UTF-8

Absolutely - the first thing I checked was whether JEP 400: UTF-8 by Default was already available in Java 17. Sadly, it wasn't.

Your PR looks good to me. :+1: Are you working on anything else or would you like me to publish version 4.4.0?

WiIIiam278 commented 9 months ago

I think it's ready to go :) Thanks for the great library.