Spark 3.1 has changed the behaviour of the CSV reader. It now decides whether to stop parsing at the delimiter based on the value of unescapedQuoteHandling.
spark-rapids needs to ensure that reading CSV tables through the plugin will honour the settings for unescapedQuoteHandling.
Just for information, our CSV does not really match Spark's all that closely. We should test it, but we might just end up documenting an incompatibility.
This arises from audit of https://github.com/apache/spark/commit/433ae9064f.
Spark 3.1 has changed the behaviour of the CSV reader. It now decides whether to stop parsing at the delimiter based on the value of
unescapedQuoteHandling
.spark-rapids
needs to ensure that reading CSV tables through the plugin will honour the settings forunescapedQuoteHandling
.More info in the JIRA: https://issues.apache.org/jira/browse/SPARK-33566