albfernandez / javadbf

Java library for reading and writing Xbase (dBase/DBF) files.
GNU Lesser General Public License v3.0
224 stars 98 forks source link

Use UTF in DBFWriter fails in sync mode #68

Closed jourquin closed 4 years ago

jourquin commented 4 years ago

The "OutputStream" based constructors have no problems with setting the CharSet to UTF-8. This is not true for the "File" (sync) based constructors, in which the following code is used;

if (DBFCharsetHelper.getDBFCodeForCharset(charset) == 0) {
            throw new DBFException("Unssuported charset " + charset);
          }

However, there is no DBFCode defined for UTF-8. An exception is thus thrown.

Note that the validity of the Charset is not tested in the "OutputStream" versions.

What is the value of the "Language driver" byte used in the header to set UTF-8 ?

jourquin commented 4 years ago

Digging a little bit in (recent) posts, it seems that there is no language driver code for UTF-8. The latest being a multibyte encoding using "simple" latin characters, any latin charset code can be used, as the UTF encoding is embedded in the string itself.

DBF files are extensively used in GIS software, as they are one component of the ESRI shape files format. Note that, since ARCGIS 10.2.1, UTF-8 is used as default encoding for DBF files. This is also the case for other GIS softwares.

To solve ticket #68, I would suggest, either to remove the charset test, either to throw an exception only if the code corresponds to an unsuported Charset that is not UTF-8:

if (DBFCharsetHelper.getDBFCodeForCharset(charset) == 0) {
            if (charset != StandardCharsets.UTF_8) {
              throw new DBFException("Unssuported charset " + charset);
            }
          }

Finally, it would be nice to have the possibility to set the default Charset in DBase.java using a static method. Something like:

protected static Charset defaultCharset = StandardCharsets.ISO_8859_1;

  public static void setDefaultCharset(Charset charset) {
    defaultCharset = charset;
  }