BaseXdb / basex

BaseX Main Repository.
http://basex.org
BSD 3-Clause "New" or "Revised" License
684 stars 265 forks source link

Supply encoding when constructing String from byte array #2248

Closed GuntherRademacher closed 1 year ago

GuntherRademacher commented 1 year ago

When running mvn test from command line, I noticed a failing test with this report:

[ERROR]   FnModuleTest.decodeFromUri:338->Sandbox.query:105->Sandbox.compare:122
 fn:decode-from-uri("%F0%9F%92%A1")
 ==> expected: <💡> but was: <💡>

The same did not happen when running that test from within Eclipse.

Some further research revealed that the tests are running with different default encodings, command line execution with Windows-1252, and Eclipse with UTF-8.

In this case the different encodings helped to find the cause: the implementation of function fn:decode-from-uri uses the String(byte[]) constructor, which depends on the default encoding. I presume that this constructor should not be used at all, so I have replaced all references by using a constructor supplying an encoding of UTF-8.

ChristianGruen commented 1 year ago

Good observation! The newly introduced code was inconsistent; I fixed it via a separate commit (10c5287281b957e336474f4c84b53f7e8bff307b). The handling of TAR files is a bit unfortunate: As the encoding cannot be derived from the source file, we usually rely on the system encoding and use UTF-8 as fallback. It’s used rarely enough not to introduce user options for specifying the encoding.