Kajabity / Kajabity-Tools

A collection of miscellaneous code utilities and snippets.
http://kajabity.com/kajabity-tools/
18 stars 11 forks source link

Added a test for encoding usage in non-ascii files #5

Closed stackh34p closed 8 years ago

stackh34p commented 8 years ago

Added a test case that shows how encoding makes a difference when reading property files stored using UTF8 encoding, as compared to ones stored via ISO-8859-1. The test confirms that non-escaped UTF8 files are identical to the same file processed via native2ascii tool only if properly loaded with UTF8 encoding.

Kajabity commented 8 years ago

Just spotted your PR - looks marvellous. Thanks.

stackh34p commented 8 years ago

Glad you appreciate it.

I remember we discussed support for custom encoding to the writer but I did not have time to take a look at it. As far as I remember, you were doing unicode escapes when storing a property file. In such case, the produced file will always contain ASCII characters (since the unicode escapes guarantee this), and use of encoding will be pointless.

What would eventually make sense is to create another method to the writer - for example WriteUnescaped, which will bypass the unicode escape logic and store items as-is. We can then pass various encodings to that method for use in the produced files. The goal of this is to store a properties file in a form that is still readable when opened by another editor (notepad, or the IDE). The drawback is that one should remember to escape the file (using native2ascii, or Kajabity) before a Java project can use it. I may consider implementing this if you are interested and see good uses of it.

Kajabity commented 8 years ago

You make a good point in your last sentence and while a WriteUnescapedmethod would provide a degree of completeness, I am not waiting on it myself. As I don't tend I use non-ASCII properties files I don't have any specific use cases.

On that basis I will treat the software as complete to up-version to v0.4 as I want to split out the Forms classes once I've done that so that the Java properties and CSV classes don't have an unnecessary dependency on Windows Forms.

stackh34p commented 8 years ago

Very well. Being able to read non-ISO-encoded properties with a custom encoding is good enough in a .NET application (at least for my needs). I doubt someone would require storing property files using other than the standard encoding.

So, good luck implementing your next goals. I'd be happy to give you a hand if the need arises.