FasterXML / jackson-dataformat-csv

(DEPRECATED) -- moved under: https://github.com/FasterXML/jackson-dataformats-text
194 stars 76 forks source link

Doesn't handle whitespace outside of quotes values correctly #19

Open tomdz opened 11 years ago

tomdz commented 11 years ago

When parsing a CSV file like:

"foo", "bar", "baz"
"baz", "foo", "bar"

the CSV parser will get confused and give me back exactly two values:

foo

and

 bar, baz
baz, foo, bar

(note the leading space here).

According to RFC 4180, these spaces should be considered to be part of the value, e.g. it should return 'foo', ' bar',' baz', and 'baz', ' foo', ' bar'. Alternatively - maybe via a feature - it could trim the whitespace outside of quoted strings, e.g. 'foo', 'bar','baz', and 'baz', 'foo', 'bar'.

cowtowncoder commented 11 years ago

Quick note: trimming is already supported with CsvParser.TRIM_SPACES, see: http://fasterxml.github.io/jackson-dataformat-csv/javadoc/2.2.0/com/fasterxml/jackson/dataformat/csv/CsvParser.Feature.html#TRIM_SPACES

But I'll see what's up with eating of spaces...

cowtowncoder commented 11 years ago

Hmmh. I am guessing that some spaces are missing from the example, due to Markdown? If so, could you add an example that uses, say, underscores to denote where spaces are. I need to write a unit test to verify what gives, should be an easy thing to solve.

cowtowncoder commented 11 years ago

Actually it looks like I can reproduce this on my own.

cowtowncoder commented 11 years ago

Hmmh. Reading through RFC 4180, I do not see definition of whether spaces would be allowed in the way described, outside quotes. But I think it would make sense to handle them in intuitive way.

FWIW, enabled TRIM_SPACES should solve your specific problem I think, until I'll fix the issue for un-trimmed case.

I assume that spaces outside of quotes should be trimmed anyway; does not make sense to make to leave them.

qrlodhi commented 9 years ago

Any update on this issue?

I have same issue and even though this specific case (where delimiter is a comma) is solved by using CsvParser.TRIM_SPACES as stated above, it messes things up when input delimiter is a space. I can use two different mappers for different delimiters but then the indexes of fields change if the delimiter changes. So it'll be nice to see these spaces handled by Jackson CSV parser.

cowtowncoder commented 9 years ago

Unfortunately no update yet. I realize this is an important feature, and hope to address it. Interesting note on spaces, thank you for mentioning this; I hadn't thought this would be commonly done.