NVIDIA / spark-rapids-tools

User tools for Spark RAPIDS
Apache License 2.0
50 stars 37 forks source link

Ensure UTF-8 encoding for reading non-english characters #1211

Closed parthosa closed 2 months ago

parthosa commented 2 months ago

Fixes #1209. This PR fixes an issue where unit tests fail due to encoding errors when reading non-English characters.

Changes

Scalastyle Output

Verified the scalastyle check in dev branch. 28 occurrences of Source.from are detected.

Scalastyle Output
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/test/scala/com/nvidia/spark/rapids/tool/util/ToolUtilsSuite.scala message=Use UTF8Source.from instead of Source.from line=234 column=37
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/test/scala/com/nvidia/spark/rapids/tool/util/ToolUtilsSuite.scala message=Use UTF8Source.from instead of Source.from line=235 column=37
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/test/scala/com/nvidia/spark/rapids/tool/profiling/GenerateTimelineSuite.scala message=Use UTF8Source.from instead of Source.from line=74 column=23
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/test/scala/com/nvidia/spark/rapids/tool/profiling/GenerateDotSuite.scala message=Use UTF8Source.from instead of Source.from line=74 column=23
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/test/scala/com/nvidia/spark/rapids/tool/qualification/QualificationSuite.scala message=Use UTF8Source.from instead of Source.from line=291 column=24
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/test/scala/com/nvidia/spark/rapids/tool/qualification/QualificationSuite.scala message=Use UTF8Source.from instead of Source.from line=328 column=24
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/test/scala/com/nvidia/spark/rapids/tool/qualification/QualificationSuite.scala message=Use UTF8Source.from instead of Source.from line=341 column=30
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/test/scala/com/nvidia/spark/rapids/tool/qualification/QualificationSuite.scala message=Use UTF8Source.from instead of Source.from line=380 column=24
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/test/scala/com/nvidia/spark/rapids/tool/qualification/QualificationSuite.scala message=Use UTF8Source.from instead of Source.from line=390 column=30
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/test/scala/com/nvidia/spark/rapids/tool/qualification/QualificationSuite.scala message=Use UTF8Source.from instead of Source.from line=416 column=24
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/test/scala/com/nvidia/spark/rapids/tool/qualification/QualificationSuite.scala message=Use UTF8Source.from instead of Source.from line=450 column=24
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/test/scala/com/nvidia/spark/rapids/tool/qualification/QualificationSuite.scala message=Use UTF8Source.from instead of Source.from line=498 column=27
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/test/scala/com/nvidia/spark/rapids/tool/qualification/QualificationSuite.scala message=Use UTF8Source.from instead of Source.from line=1249 column=28
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/test/scala/com/nvidia/spark/rapids/tool/qualification/QualificationSuite.scala message=Use UTF8Source.from instead of Source.from line=1389 column=19
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/test/scala/com/nvidia/spark/rapids/tool/qualification/QualificationSuite.scala message=Use UTF8Source.from instead of Source.from line=1558 column=28
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/test/scala/com/nvidia/spark/rapids/tool/planparser/SqlPlanParserSuite.scala message=Use UTF8Source.from instead of Source.from line=121 column=27
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/main/scala/org/apache/spark/sql/rapids/tool/util/RapidsToolsConfUtil.scala message=Use UTF8Source.from instead of Source.from line=98 column=19
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/main/scala/org/apache/spark/sql/rapids/tool/AppBase.scala message=Use UTF8Source.from instead of Source.from line=264 column=14
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/main/scala/com/nvidia/spark/rapids/tool/profiling/DriverLogProcessor.scala message=Use UTF8Source.from instead of Source.from line=43 column=17
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/main/scala/com/nvidia/spark/rapids/tool/qualification/PluginTypeChecker.scala message=Use UTF8Source.from instead of Source.from line=110 column=17
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/main/scala/com/nvidia/spark/rapids/tool/qualification/PluginTypeChecker.scala message=Use UTF8Source.from instead of Source.from line=117 column=17
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/main/scala/com/nvidia/spark/rapids/tool/qualification/PluginTypeChecker.scala message=Use UTF8Source.from instead of Source.from line=129 column=23
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/main/scala/com/nvidia/spark/rapids/tool/qualification/PluginTypeChecker.scala message=Use UTF8Source.from instead of Source.from line=136 column=25
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/main/scala/com/nvidia/spark/rapids/tool/qualification/PluginTypeChecker.scala message=Use UTF8Source.from instead of Source.from line=155 column=17
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/main/scala/com/nvidia/spark/rapids/tool/qualification/PluginTypeChecker.scala message=Use UTF8Source.from instead of Source.from line=160 column=17
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/main/scala/com/nvidia/spark/rapids/tool/qualification/PluginTypeChecker.scala message=Use UTF8Source.from instead of Source.from line=168 column=22
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/main/scala/com/nvidia/spark/rapids/tool/qualification/PluginTypeChecker.scala message=Use UTF8Source.from instead of Source.from line=170 column=22
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/main/scala/com/nvidia/spark/rapids/tool/qualification/PluginTypeChecker.scala message=Use UTF8Source.from instead of Source.from line=178 column=17
Processed 164 file(s)
Found 28 errors

Testing

Tested the changes in a CICD job.

Note