Fixes #1209. This PR fixes an issue where unit tests fail due to encoding errors when reading non-English characters.
Changes
Added a wrapper object UTF8Source around Source with charset set as UTF8
Added a check in scalastyle_config.xml that disables usage of any Source.from related methods.
Included a helper method FSUtils.readFileContentAsUTF8() to read file as UTF8 and close resources.
Scalastyle Output
Verified the scalastyle check in dev branch. 28 occurrences of Source.from are detected.
Scalastyle Output
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/test/scala/com/nvidia/spark/rapids/tool/util/ToolUtilsSuite.scala message=Use UTF8Source.from instead of Source.from line=234 column=37
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/test/scala/com/nvidia/spark/rapids/tool/util/ToolUtilsSuite.scala message=Use UTF8Source.from instead of Source.from line=235 column=37
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/test/scala/com/nvidia/spark/rapids/tool/profiling/GenerateTimelineSuite.scala message=Use UTF8Source.from instead of Source.from line=74 column=23
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/test/scala/com/nvidia/spark/rapids/tool/profiling/GenerateDotSuite.scala message=Use UTF8Source.from instead of Source.from line=74 column=23
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/test/scala/com/nvidia/spark/rapids/tool/qualification/QualificationSuite.scala message=Use UTF8Source.from instead of Source.from line=291 column=24
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/test/scala/com/nvidia/spark/rapids/tool/qualification/QualificationSuite.scala message=Use UTF8Source.from instead of Source.from line=328 column=24
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/test/scala/com/nvidia/spark/rapids/tool/qualification/QualificationSuite.scala message=Use UTF8Source.from instead of Source.from line=341 column=30
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/test/scala/com/nvidia/spark/rapids/tool/qualification/QualificationSuite.scala message=Use UTF8Source.from instead of Source.from line=380 column=24
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/test/scala/com/nvidia/spark/rapids/tool/qualification/QualificationSuite.scala message=Use UTF8Source.from instead of Source.from line=390 column=30
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/test/scala/com/nvidia/spark/rapids/tool/qualification/QualificationSuite.scala message=Use UTF8Source.from instead of Source.from line=416 column=24
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/test/scala/com/nvidia/spark/rapids/tool/qualification/QualificationSuite.scala message=Use UTF8Source.from instead of Source.from line=450 column=24
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/test/scala/com/nvidia/spark/rapids/tool/qualification/QualificationSuite.scala message=Use UTF8Source.from instead of Source.from line=498 column=27
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/test/scala/com/nvidia/spark/rapids/tool/qualification/QualificationSuite.scala message=Use UTF8Source.from instead of Source.from line=1249 column=28
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/test/scala/com/nvidia/spark/rapids/tool/qualification/QualificationSuite.scala message=Use UTF8Source.from instead of Source.from line=1389 column=19
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/test/scala/com/nvidia/spark/rapids/tool/qualification/QualificationSuite.scala message=Use UTF8Source.from instead of Source.from line=1558 column=28
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/test/scala/com/nvidia/spark/rapids/tool/planparser/SqlPlanParserSuite.scala message=Use UTF8Source.from instead of Source.from line=121 column=27
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/main/scala/org/apache/spark/sql/rapids/tool/util/RapidsToolsConfUtil.scala message=Use UTF8Source.from instead of Source.from line=98 column=19
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/main/scala/org/apache/spark/sql/rapids/tool/AppBase.scala message=Use UTF8Source.from instead of Source.from line=264 column=14
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/main/scala/com/nvidia/spark/rapids/tool/profiling/DriverLogProcessor.scala message=Use UTF8Source.from instead of Source.from line=43 column=17
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/main/scala/com/nvidia/spark/rapids/tool/qualification/PluginTypeChecker.scala message=Use UTF8Source.from instead of Source.from line=110 column=17
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/main/scala/com/nvidia/spark/rapids/tool/qualification/PluginTypeChecker.scala message=Use UTF8Source.from instead of Source.from line=117 column=17
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/main/scala/com/nvidia/spark/rapids/tool/qualification/PluginTypeChecker.scala message=Use UTF8Source.from instead of Source.from line=129 column=23
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/main/scala/com/nvidia/spark/rapids/tool/qualification/PluginTypeChecker.scala message=Use UTF8Source.from instead of Source.from line=136 column=25
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/main/scala/com/nvidia/spark/rapids/tool/qualification/PluginTypeChecker.scala message=Use UTF8Source.from instead of Source.from line=155 column=17
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/main/scala/com/nvidia/spark/rapids/tool/qualification/PluginTypeChecker.scala message=Use UTF8Source.from instead of Source.from line=160 column=17
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/main/scala/com/nvidia/spark/rapids/tool/qualification/PluginTypeChecker.scala message=Use UTF8Source.from instead of Source.from line=168 column=22
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/main/scala/com/nvidia/spark/rapids/tool/qualification/PluginTypeChecker.scala message=Use UTF8Source.from instead of Source.from line=170 column=22
error file=/Users/psarthi/Work/spark-rapids-tools/core/src/main/scala/com/nvidia/spark/rapids/tool/qualification/PluginTypeChecker.scala message=Use UTF8Source.from instead of Source.from line=178 column=17
Processed 164 file(s)
Found 28 errors
Testing
Tested the changes in a CICD job.
Note
The scalastyle check has been kept broad i.e. Source.from instead of specifics (i.e. Source.fromFile) so that in future if anyone uses a new method (eg. Source.fromURI), it will be automatically be caught.
Fixes #1209. This PR fixes an issue where unit tests fail due to encoding errors when reading non-English characters.
Changes
UTF8Source
aroundSource
with charset set asUTF8
scalastyle_config.xml
that disables usage of anySource.from
related methods.FSUtils.readFileContentAsUTF8()
to read file as UTF8 and close resources.Scalastyle Output
Verified the scalastyle check in
dev
branch. 28 occurrences ofSource.from
are detected.Scalastyle Output
Testing
Tested the changes in a CICD job.
Note
Source.from
instead of specifics (i.e.Source.fromFile
) so that in future if anyone uses a new method (eg.Source.fromURI
), it will be automatically be caught.