databricks / containers

Sample base images for Databricks Container Services
Apache License 2.0
167 stars 118 forks source link

Encoding issue on standard runtime 11.3 with unity catalog #115

Open dsjath opened 1 year ago

dsjath commented 1 year ago

Hi,

We ran into an encoding error: java.nio.charset.MalformedInputException: Input length = 1

when running sql select from pyspark selecting from a unity catalog on a new cluster with a base docker image running with the newest standard runtime (11.3).

When enforcing UTF8 encoding in the Dockerfile with ENV JAVA_TOOL_OPTIONS="-Dfile.encoding=UTF8" we were able to fix the problem.

I don't know where the encoding mismatch happens, but I think it might be a misconfiguration between the encoding of the parsed string from the jvm in the Docker image and the unity catalog.