christian-schlichtherle / truelicense

An open source engine for license management on the Java Virtual Machine.
https://truelicense.namespace.global
Apache License 2.0
319 stars 66 forks source link

ObfuscatedString: Inconsistent conversion error handling between toCharArray() and toString #20

Open Dani-Hub opened 3 years ago

Dani-Hub commented 3 years ago

Class ObfuscatedString provides basically two ways to retrieve its obfuscated string data (for good reasons): Either (and traditionally) via it's toString method or via its toCharArray(). This two-fold API is nicely designed, but it turns out that both methods do use different error handling strategies: toString is implemented by means of evaluating String.String(byte[] bytes, int offset, int length, Charset charset), which is specified such that malformed-input and unmappable-character sequences are replaced by the charset's default replacement string. Contrary to this toCharArray() is implemented by creating a fresh CharsetDecorder using Charset.newDecoder(), where the specification tells us:

The default action for malformed-input and unmappable-character errors is to report them.

I'm wondering about the different error handling strategies: First, is this difference really intended? It looks astonishing on the first sight. Assuming that it is intended, I would feel that at least a second way should be provided that creates a String by means of a byte[]->char[] conversion which would also report above sort of errors (e.g. toStringChecked), possibly also a toCharArrayUnchecked.

If agreement exists to at least some of the suggested changes, I'm willing to provide a corresponding pull request.

If no consensus exists for any code changes, I think the semantic difference between both methods should be clearly documented.

christian-schlichtherle commented 3 years ago

Thank you very much for reporting this!

The difference in error handling is unintentional. Before changing something however, I wonder what the impact is? ObfuscatedString is designed to be used for string constants in the code. Would the compiler even let you enter some undecodable character sequences in a source file? I don't think so, but I can't prove it, of course.

I'm tending to think that any change should throw a RuntimeException in this case.

Dani-Hub commented 3 years ago

I tend to agree with your point of view. But since in principal the long array contents could be (presumably unintentionally) modified afterwards or when the long array data is transported (with possible transport errors) and unpacked later, there could be encoding errors due to such modifications. So would you be fine with considering a pull request that would change the semantics of toString to throw an error in case of such errors?

christian-schlichtherle commented 3 years ago

Sounds fine! I would need some copyright waiver however, but I'm honestly not prepared for that. :-)

Your explanation would even suggest some kind of Error instead of a RuntimeException.

Dani-Hub commented 3 years ago

Let me know what I precisely would need to do to meet your copyright criteria or whether you would prefer to implement it yourself. I'm expecting the necessary code changes to be rather small, would that still require an additional formalism? Other projects where had provided contributions to do typically have a gray zone where sufficiently small changes don't require a copyright waiver. But I'm not speaking against such a formal agreement.

christian-schlichtherle commented 3 years ago

Please allow for time to check out the options. This is not exactly top priority right now.

christian-schlichtherle commented 3 years ago

I have set up a simple CLA process now. Please proceed and submit your pull request. You will be asked to sign a CLA. Don't worry, it's really simple. Thanks in advance for your contribution.