[BUG] from_json and scan json do not replace \u escaped chars in nested data returned as a string.

NVIDIA / spark-rapids

Spark RAPIDS plugin - accelerate Apache Spark with GPUs

https://nvidia.github.io/spark-rapids

Apache License 2.0

823 stars 236 forks source link

[BUG] from_json and scan json do not replace \u escaped chars in nested data returned as a string. #11632

Open revans2 opened 1 month ago

revans2 commented 1 month ago

Describe the bug In nested data that is being returned as a string column in from_json or json scan if a string contains a \u escaped character sequence in it Spark will replace that \u sequence with the smallest replacement possible. It may be the regular characters or a smaller escape sequence like \r or \n. Our code does not do this.