NVIDIA / spark-rapids-tools

User tools for Spark RAPIDS
Apache License 2.0
49 stars 36 forks source link

[FEA] Revisit string formats in tools #1136

Open amahussein opened 3 months ago

amahussein commented 3 months ago

Is your feature request related to a problem? Please describe.

After adding StringUtils.renderStr() in https://github.com/NVIDIA/spark-rapids-tools/pull/1135, we need to go through all the string contents in the output. Any string cell should be calling the escapeMeta in a consistent way. This can be done by either:

  1. calling StringUtils.renderStr() inside StringUtils.reformatCSVString, but this should be done only once during the generation of the CSV file. The pros of this method is the simplicity and catching all fields at the very end.
  2. calling StringUtils.renderStr(doEscape=true) during the creation of the field. The cons of this strategy is that it is error prune and each time a new field is created, it has to be processed the same way.