NCAR / ncl

The NCAR Command Language (NCL) is a scripting language for the analysis and visualization of climate and weather data.
http://www.ncl.ucar.edu
Other
265 stars 65 forks source link

Handle both DOS and Unix line endings in text files #12

Open Dave-Allured opened 5 years ago

Dave-Allured commented 5 years ago

Please update readAsciiTable and similar NCL text input functions to automatically recognize and remove both DOS and Unix line endings. Thank you.

A conservative approach would be to recognize and strip two cases which are equivalent to single line termination. Those cases are LF alone (Unix/Linux), and CR-LF (DOS). Both types are common in the Real World.

I would also suggest that you accept a mixture of both LF and CR-LF in the same file. This easily happens when pieces of different files are pasted together. In other words, don't try to determine a single line ending rule for an entire file.

Less commonly, there are other oddball cases, such as CR alone. I suggest leaving these cases untouched until they are encountered in a real use case.

Dave-Allured commented 5 years ago

I took a second look at Xiaoming's current use case:

http://mailman.ucar.edu/pipermail/ncl-talk/2018-November/013502.html

This might actually be one of those oddball cases. It seems that the text file might actually contain the sequence CR CR LF to get the reported behavior with the current NCL version. We do not know for sure because there was no file attachment or file dump.

The problem is we can't be sure what the real intention is here. Should that sequence represent one logical text line ending, or two? The three character sequence CR CR LF is certainly not a valid single line termination in any convention I have heard of. In some use cases, blank lines are important. Worse, we don't even know the exact file contents. What process ever created a file with this weird sequence?

My advice for this case is process it using the general rules in my original proposal, strip off any CR-LF endings, and leave the second CR in place as an anomaly to be handled by the user.