Closed vinceecws closed 3 years ago
I believe we resolved this yesterday when Vince added a delimiter parameter to IndexUtil.write
This issue was opened due to an ignorance of how Spark internally handles conflicts in field values with delimiters by wrapping the entire field with quotation marks.
Using the above example, if the delimiter specified in Spark is "," and the field contains "," like com,hellosummerville)/jobs/law-enforcement-security
, Spark DataFrameWriter will automatically handle that by wrapping the field with quotation marks like so "com,hellosummerville)/jobs/law-enforcement-security"
.
The SURT URL column contains "," in its values:
So, a comma delimited output format for FilteredIndex will not be ideal. A better choice would be to delimit by whitespace.