We ran into an encoding error:
java.nio.charset.MalformedInputException: Input length = 1
when running sql select from pyspark selecting from a unity catalog on a new cluster with a base docker image running with the newest standard runtime (11.3).
When enforcing UTF8 encoding in the Dockerfile with
ENV JAVA_TOOL_OPTIONS="-Dfile.encoding=UTF8"
we were able to fix the problem.
I don't know where the encoding mismatch happens, but I think it might be a misconfiguration between the encoding of the parsed string from the jvm in the Docker image and the unity catalog.
Hi,
We ran into an encoding error:
java.nio.charset.MalformedInputException: Input length = 1
when running sql select from pyspark selecting from a unity catalog on a new cluster with a base docker image running with the newest standard runtime (11.3).
When enforcing UTF8 encoding in the Dockerfile with
ENV JAVA_TOOL_OPTIONS="-Dfile.encoding=UTF8"
we were able to fix the problem.I don't know where the encoding mismatch happens, but I think it might be a misconfiguration between the encoding of the parsed string from the jvm in the Docker image and the unity catalog.