One concern for this change:
we only support UTF-8 encoded data loading for CSV(https://github.com/NVIDIA/spark-rapids/blob/branch-23.12/sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuCSVScan.scala#L121). We may see performance drop when converting with the spark-rapids plugin.
A better solution is to check special characters for all tables, we only specify ISO-8859 for tables that contain those international characters. I will check how many tables have such issue. If not many, I'll apply this specific process.
To close #170 .
before this change:
after:
One concern for this change: we only support UTF-8 encoded data loading for CSV(https://github.com/NVIDIA/spark-rapids/blob/branch-23.12/sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuCSVScan.scala#L121). We may see performance drop when converting with the spark-rapids plugin. A better solution is to check special characters for all tables, we only specify ISO-8859 for tables that contain those international characters. I will check how many tables have such issue. If not many, I'll apply this specific process.