FasterXML / jackson-dataformat-csv

(DEPRECATED) -- moved under: https://github.com/FasterXML/jackson-dataformats-text
194 stars 76 forks source link

Unexpected character causes exception for hasNext (2.8.3) #150

Open michaelkrog opened 7 years ago

michaelkrog commented 7 years ago

I have a 3rd party CSV file Im trying to import. Their quotes are not escaped so I have data like this:

124785285,"PLACE","","Pindsvineplejerne, Dyreværnsforening Af 03 Januar 2016".","","","","22334455","OKMO","PHONE_NORMAL_MOBILE","0",1,0,"185","01305",12,"Diesen Alle","",,"","1234","Andeby","","","","","","","08-06-2017 00:00:00",1561

(Notice the 4th column which has the syntax "{text}".")

Parsing this causes particular line causes an exception which is hard to recover from because its thrown when asking is iterator has more elements.

Stacktrace:

Caused by: com.fasterxml.jackson.core.JsonParseException: Unexpected character ('7' (code 55)): Expected separator ('"' (code 34)) or end-of-line
 at [Source: com.fasterxml.jackson.dataformat.csv.impl.UTF8Reader@741833c; line: 2716427, column: 92]
    at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1702)
    at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:558)
    at com.fasterxml.jackson.core.base.ParserMinimalBase._reportUnexpectedChar(ParserMinimalBase.java:456)
    at com.fasterxml.jackson.dataformat.csv.CsvParser._reportUnexpectedCsvChar(CsvParser.java:1089)
    at com.fasterxml.jackson.dataformat.csv.impl.CsvDecoder._nextQuotedString(CsvDecoder.java:838)
    at com.fasterxml.jackson.dataformat.csv.impl.CsvDecoder.nextString(CsvDecoder.java:601)
    at com.fasterxml.jackson.dataformat.csv.CsvParser._skipUntilEndOfLine(CsvParser.java:916)
    at com.fasterxml.jackson.dataformat.csv.CsvParser.nextToken(CsvParser.java:532)
    at com.fasterxml.jackson.databind.MappingIterator._resync(MappingIterator.java:391)
    at com.fasterxml.jackson.databind.MappingIterator.hasNextValue(MappingIterator.java:235)
    at com.fasterxml.jackson.databind.MappingIterator.hasNext(MappingIterator.java:180)
    ... 56 common frames omitted
cowtowncoder commented 7 years ago

Would it be possible to wrap this in a unit test, and see that skipping is still failing with 2.8.9? There have been some improvements in patch versions. It should be possible to recover from this problem I think, ideally only losing rest of the line (although in some cases sync may only occur with more data, losing next line too).

Exception msg looks odd too; perhaps sample line and exception are not from same run?