Open aboodman opened 8 years ago
@kyleder's going to look at this!
One of the issues here appears to be that the CSV (which was generated by Mac Excel) uses carriage returns instead of line feeds for the line terminator. From what I've read, Go really dislikes CR's so once I have some time tonight I'll look for an efficient way to convert the CR's to LF's. Another user complained about the same issue here which seems like a good place to start.
I believe that there is an additional issue with unbalanced quotations in some of the fields that is breaking the parser, but I want to make sure that the CR issue is resolved before tackling that.
Oh yeah, we saw that on a different file, here: https://github.com/attic-labs/attic/issues/446.
@arv and @cmasone-attic looked at that package, but they did not like it because it blindly converts all CR to CRLF. Which means that files that have CRLF (which is the standard) will end up as LFLF, which is also going to break things.
We need a converter that changes CR into LF only when it appears on its own, not part of CRLF.
I think that right thing to do here is to implement an io.Reader
that wraps bufio.Reader
and does the conversion. Then the CSV parser can be changed to take the CR-killing reader as input rather than a standard reader.
Regarding macreader blindly replacing CR's, I think that that's OK since the CSV reader will ignore blank lines. But I think that it's irrelevant anyways because I extended the examples that ctessum provided to include one with CRLF's and the result was identical:
// testFile2 is a CSV file with CRLF line endings.
testFile2 := bytes.NewBufferString("a,b,c\r\n1,2,3\r\n").Bytes()
r3 := csv.NewReader(New(bytes.NewReader(testFile2)))
lines3, err := r3.ReadAll()
fmt.Printf("With macreader (CRLF): %#v\n", lines3)
if !reflect.DeepEqual(lines2, lines3) {
t.Error("Expected CR result to match CRLF result")
}
I don't know enough about how this works to say with any certainty that this is acceptable, but on the surface it seems that way.
I'm going to try integrating macreader with noms to see if it fixes the CR issue and I'll report back.
One issue with using macreader as is, is that it will replace \r\n
inside quoted fields with \n\n
which seems bad to me.
@arv That's a good point.
I submitted a pull request to macreader for a change that ignores CRLF's (which seem to behave in Go). I verified that the change doesn't break quoted CRLF's.
Fixing the CR issue fixes the FAA data import which is working locally now with my fork of macreader.
That's cool that the FAA data works with the changed macreader, and cool to fix the bug there. I think there is an issue with your patch to macreader, but I left it over there.