Open cseppan opened 1 year ago
Made changes to the GenereicExporter class to strip off new line characters within a header item
I tested this updated code using a dummy dataset with a description like this:
#FORMAT=FF10_POINT
abc
#COUNTRY=US
#YEAR 2030
The exported output file looks like this:
#FORMAT=FF10_POINT ab#COUNTRY=US
#YEAR 2030
Looking through the revised code:
protected void writeHeaders(PrintWriter writer, Dataset dataset, DataFormatFactory localDataFormatFactory) throws SQLException {
String header = dataset.getDescription();
String cr = System.getProperty("line.separator");
if (header != null && !header.trim().isEmpty()) {
StringTokenizer st = new StringTokenizer(header, "#");
String lasttoken = "";
while (st.hasMoreTokens()) {
lasttoken = st.nextToken();
if (!(StringUtils.isNotBlank(lasttoken) && lasttoken.substring(0, lasttoken.length() - 2).contains(cr))) {
writer.print("#" + lasttoken);
} else {
writer.print("#" + lasttoken.substring(0, lasttoken.length() - 2).replace(cr, " ") + lasttoken.substring(lasttoken.length() - 1, lasttoken.length() - 1));
}
}
if (lasttoken.indexOf(cr) < 0)
writer.print(cr);
}
printExportInfo(writer, localDataFormatFactory);
}
It looks like the updated code expects line breaks to be two characters, which it would be if the server were running on Windows, but not Linux or macOS.
lasttoken.substring(0, lasttoken.length() - 2)
Also, this chunk of code seems like it's trying to output the line break but it'll return a zero-length string since endIndex = beginIndex.
lasttoken.substring(lasttoken.length() - 1, lasttoken.length() - 1)
One other potential issue: I'm not sure if line breaks are converted to a consistent value before being stored in the database. For instance, if someone sets a dataset description on Windows, do the line breaks get saved as CR+LF? If so, working with the system's line separator property wouldn't be enough.
Apart from the line ending handling, another issue is the column header row, which shouldn't get modified. If the dataset description looked like this:
# DESC some text
and more text
"country_cd","region_cd","tribal_code"
Ideally it would be output as
# DESC some text and more text
"country_cd","region_cd","tribal_code"
This seems like something only a user would be able to accurately identify. For now, I'm going to revert commit 525cb15 for the v4.3 release.
Exported a dataset with a line break in the description. The exported file starts with the lines below.
Trying to import this into another EMF system gave the error:
Exception: Number of columns in the column header doesn't match the file format (expected:33 but was:1). Hint: correct header typos or set "Dataset Type" keyword, EXPORT_COLUMN_LABEL, to false if there is no column header
Probably should just remove any line breaks in the description when exporting datasets.